我是靠谱客的博主 追寻小伙,最近开发中收集的这篇文章主要介绍Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities),觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

from nltk import clean_html
from BeautifulSoup import BeautifulStoneSoup

content = '''Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff'''

# clean_html removes tags and
# BeautifulStoneSoup converts HTML entities
def cleanHtml(html):
    if html == "": return ""
    return BeautifulStoneSoup(clean_html(html),
        convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]

print content
print 
print cleanHtml(content)
Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff

Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
 the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
 of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior. It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that. And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff


最后

以上就是追寻小伙为你收集整理的Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities)的全部内容,希望文章能够帮你解决Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities)所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(46)

评论列表共有 0 条评论

立即
投稿
返回
顶部