python 自增爬去网页_Python3.6 下的爬虫总是重复爬第一页的内容

53 阅读 0 评论 35 点赞

我是靠谱客的博主壮观小鸽子，这篇文章主要介绍python 自增爬去网页_Python3.6 下的爬虫总是重复爬第一页的内容，现在分享给大家，希望可以做个参考。

问题如题：

改成while，试了很多，然没有效果，请教大家

# coding:utf-8

from lxml import etree

import requests,lxml.html,os

class MyError(Exception):

def __init__(self, value):

self.value = value

def __str__(self):

return repr(self.value)

def get_lawyers_info(url):

r = requests.get(url)

html = lxml.html.fromstring(r.content)

# phones = html.xpath('//span[@class="law-tel"]')

phones = html.xpath('//span[@class="phone pull-right"]')

# names = html.xpath('//div[@class="fl"]/p/a')

names = html.xpath('//h4[@class="text-center"]')

if(len(phones) == len(names)):

list(zip(names,phones))

phone_infos = [(names[i].text, phones[i].text_content()) for i in range(len(names))]

else:

error = "Lawyers amount are not equal to the amount of phone_nums: "+url

raise MyError(error)

phone_infos_list = []

for phone_info in phone_infos:

if(phone_info[0] == ""):

info = "没留姓名"+": "+phone_info[1]+"rn"

else:

info = phone_info[0]+": "+phone_info[1]+"rn"

print (info)

phone_infos_list.append(info)

return phone_infos_list

dir_path = os.path.abspath(os.path.dirname(__file__))

print (dir_path)

file_path = os.path.join(dir_path,"lawyers_info.txt")

print (file_path)

if os.path.exists(file_path):

os.remove(file_path)

with open("lawyers_info.txt","ab") as file:

for i in range(1000):

url = "http://www.xxxx.com/cooperative_merchants?searchText=&industry=100&provinceId=19&cityId=0&areaId=0&page="+str(i+1)

# r = requests.get(url)

# html = lxml.html.fromstring(r.content)

# phones = html.xpath('//span[@class="phone pull-right"]')

# names = html.xpath('//h4[@class="text-center"]')

# if phones or names:

info = get_lawyers_info(url)

for each in info:

file.write(each.encode("gbk"))

最后

以上就是壮观小鸽子最近收集整理的关于python 自增爬去网页_Python3.6 下的爬虫总是重复爬第一页的内容的全部内容，更多相关python内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：python 自增爬去网页
浏览次数：53 次浏览
发布日期：2024-07-23 22:30:01
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_18_f0_13__7__10_1.html

python 自增爬去网页_Python3.6 下的爬虫总是重复爬第一页的内容

最后

评论列表共有 0 条评论

发表评论取消回复

python 自增爬去网页_Python3.6 下的爬虫总是重复爬第一页的内容

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复