概述
在写爬虫的时候,难免会遇到报错,比如 4XX ,5XX,有些可能是网络的原因,或者一些其他的原因,这个时候我们希望程序去做第二次下载,
有一种很low的解决方案,比如是用 try except
try: ------- except: try: -------- except: try: ------ except: try: ------ except: try: ------ except: try: ------ except: ------
有没有看起来更舒服的写法呢?
我们可以用递归实现这个过程
代码如下
request_urls = [ "https://www.baidu.com/", "https://www.baidu.com/", "https://www.baidu.com/", "https://www.ba111111idu.com/", "https://www.baidu.com/", "https://www.baidu.com/", ] def down_load(url,request_max=3): print "正在请求的URL是:",url result_html = "" result_status_code = "" try: result = session.get(url=url) result_html = result.content result_status_code = result.status_code print result_status_code except Exception as e: print e if request_max >0: if result_status_code != 200: return down_load(url,request_max-1) return result_html for url in request_urls: down_load(url=url,request_max=13)
输出结果:
C:Python27python.exe C:/Users/xuchunlin/PycharmProjects/A9_25/auction/test.py 正在请求的URL是: https://www.baidu.com/ 200 正在请求的URL是: https://www.baidu.com/ 200 正在请求的URL是: https://www.baidu.com/ 200 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6438>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA65F8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6828>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6A90>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA62E8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6D30>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6DD8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B682B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68080>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B685C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B687F0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.ba111111idu.com/ HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68C50>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',)) 正在请求的URL是: https://www.baidu.com/ 200 正在请求的URL是: https://www.baidu.com/ 200 Process finished with exit code 0
转载于:https://www.cnblogs.com/xuchunlin/p/8565952.html
最后
以上就是热心板凳为你收集整理的python 爬虫 重复下载 二次请求的全部内容,希望文章能够帮你解决python 爬虫 重复下载 二次请求所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复