我是靠谱客的博主 追寻小伙,最近开发中收集的这篇文章主要介绍Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities),觉得挺不错的,现在分享给大家,希望可以做个参考。
概述
from nltk import clean_html
from BeautifulSoup import BeautifulStoneSoup
content = '''Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff'''
# clean_html removes tags and
# BeautifulStoneSoup converts HTML entities
def cleanHtml(html):
if html == "": return ""
return BeautifulStoneSoup(clean_html(html),
convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]
print content
print
print cleanHtml(content)
Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior.<br /><br />It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that.<br /><br />And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff
Is anyone else having troubles with Bluetooth on a Moto X?
u00a0It connects fine to my car when I make a call, but the bluetooth drops in
and out, and the phone prompts me to ask whether I want to use the speakerphone,
the headset, or the bluetooth - but a few seconds later, it connects back to bl
uetooth. u00a0And oddly, it only happens some of the time. u00a0And other uses
of Bluetooth from the phone - for example, playing an audiobook or music - demo
nstrate no similar behavior. It's a disastrous bug. Making me thi
nk about switching to another phone, even though I love this one. u00a0And it s
eems to have been introduced only in the past month or so, as the phone worked f
ine with the car before that. And yes, I've tried forgetting and
re-initiating the bluetooth connection.ufeff
最后
以上就是追寻小伙为你收集整理的Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities)的全部内容,希望文章能够帮你解决Python 使用nltk和BeautifulSoup进行数据清理 (去除html tag和转换html entities)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复