概述
为了从输入中删除不需要的/不安全的标记和属性,我使用了以下代码(几乎完全是通过http://djangosnippets.org/snippets/1655/):def html_filter(value, allowed_tags = 'p h1 h2 h3 div span a:href:title img:src:alt:title table:cellspacing:cellpadding th tr td:colspan:rowspan ol ul li br'):
js_regex = re.compile(r'[s]*(.{1,7})?'.join(list('javascript')))
allowed_tags = [tag.split(':') for tag in allowed_tags.split()]
allowed_tags = dict((tag[0], tag[1:]) for tag in allowed_tags)
soup = BeautifulSoup(value)
for comment in soup.findAll(text=lambda text: isinstance(text, Comment)):
comment.extract()
for tag in soup.findAll(True):
if tag.name not in allowed_tags:
tag.hidden = True
else:
tag.attrs = [(attr, js_regex.sub('', val)) for attr, val in tag.attrs.items() if attr in allowed_tags[tag.name]]
return soup.renderContents().decode('utf8')
它适用于不需要的或白名单上的标签,没有白名单的属性,甚至是格式不好的html。但是,如果列出了任何白色属性,则会引发
^{pr2}$
最后一行,对我没什么帮助。type(soup)是{}它是否引发错误,所以我不知道它指的是什么。在Traceback:
[...]
File "C:UsersMarkWebwwwfnwidjangosrcbasefunctionshtml_filter.py" in html_filter
30. return soup.renderContents().decode('utf8')
File "C:Python27libsite-packagesbs4element.py" in renderContents
1098. indent_level=indentLevel, encoding=encoding)
File "C:Python27libsite-packagesbs4element.py" in encode_contents
1089. contents = self.decode_contents(indent_level, encoding, formatter)
File "C:Python27libsite-packagesbs4element.py" in decode_contents
1074. formatter))
File "C:Python27libsite-packagesbs4element.py" in decode
1021. indent_contents, eventual_encoding, formatter)
File "C:Python27libsite-packagesbs4element.py" in decode_contents
1074. formatter))
File "C:Python27libsite-packagesbs4element.py" in decode
1021. indent_contents, eventual_encoding, formatter)
File "C:Python27libsite-packagesbs4element.py" in decode_contents
1074. formatter))
File "C:Python27libsite-packagesbs4element.py" in decode
1021. indent_contents, eventual_encoding, formatter)
File "C:Python27libsite-packagesbs4element.py" in decode_contents
1074. formatter))
File "C:Python27libsite-packagesbs4element.py" in decode
983. for key, val in sorted(self.attrs.items()):
Exception Type: AttributeError at /"nieuws"/article/3-test/
Exception Value: 'list' object has no attribute 'items'
最后
以上就是单纯枕头为你收集整理的python的items不是迭代器_“list”对象在Python的beauthulsoup renderContents中没有属性“items”...的全部内容,希望文章能够帮你解决python的items不是迭代器_“list”对象在Python的beauthulsoup renderContents中没有属性“items”...所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复