复制代码
1
1、引入urllib库。
2、发起请求。
3、读取返回的内容。
4、编码设置。(b'为二进制编码,需要转化为utf-8)
5、打印出来。
复制代码
1
2
3
4
5import urllib.request response=urllib.request.urlopen("http://www.baidu.com") html=response.read() html=html.decode("utf-8") print(html)
二、下载图片并保存到本地
复制代码
3、有道翻译
1
2
3
4
5
6
7
8
9
10
11
12
13
14import urllib.request #****this is the first way*** #response = urllib.request.urlopen("https://img6.bdstatic.com/img/image/smallpic/weiju112.jpg") #****this is the second way*** req = urllib.request.Request("https://img6.bdstatic.com/img/image/smallpic/weiju112.jpg") response=urllib.request.urlopen(req) cat_img = response.read() with open('aaaabbbbcccc.jpg','wb') as f: f.write(cat_img)
复制代码
4、有道翻译增加头部信息(1)(通过增加header信息参数,创建头部字典)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24import urllib.request import urllib.parse import json content=input("Please input the content that you will translate:") url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link' data={} data['action']='FY_BY_CLICKBUTTON' data['doctype']='json' data['i']=content data['keyfrom']='fanyi.web' data['type']='auto' data['typoResult']='true' data['ue']='UTF-8' data['xmlVersion']='1.8' data=urllib.parse.urlencode(data).encode("utf-8") response=urllib.request.urlopen(url,data) html=response.read().decode('utf-8') res=json.loads(html) #res is a direct print("The result:%s" % (res['translateResult'][0][0]['tgt']))
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31import urllib.request import urllib.parse import json content=input("Please input the content that you will translate:") url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link' head={} # the info of req.header to imitate the Agent just like visiting the website by browser head['User-Agent']="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0" data={} data['action']='FY_BY_CLICKBUTTON' data['doctype']='json' data['i']=content data['keyfrom']='fanyi.web' data['type']='auto' data['typoResult']='true' data['ue']='UTF-8' data['xmlVersion']='1.8' data=urllib.parse.urlencode(data).encode("utf-8") #response=urllib.request.urlopen(url,data) req=urllib.request.Request(url,data,head) response=urllib.request.urlopen(req) html=response.read().decode('utf-8') res=json.loads(html) #res is a direct print("The result:%s" % (res['translateResult'][0][0]['tgt']))
5、有道翻译增加头部信息(2)(通过Request.add_header())。
复制代码
7、使用代理。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35import urllib.request import urllib.parse import json content=input("Please input the content that you will translate:") url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=https://www.baidu.com/link' ''' head={} # the info of req.header to imitate the Agent just like visiting the website by browser head['User-Agent']="Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0" ''' data={} data['action']='FY_BY_CLICKBUTTON' data['doctype']='json' data['i']=content data['keyfrom']='fanyi.web' data['type']='auto' data['typoResult']='true' data['ue']='UTF-8' data['xmlVersion']='1.8' data=urllib.parse.urlencode(data).encode("utf-8") #response=urllib.request.urlopen(url,data) req=urllib.request.Request(url,data) req.add_header('User-Agent',"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0") response=urllib.request.urlopen(req) html=response.read().decode('utf-8') res=json.loads(html) #res is a direct print("The result:%s" % (res['translateResult'][0][0]['tgt']))
1、创建参数字典{‘type’:'proxy ip':'port'}
proxy_support=urllib.request.ProxyHandler({})
2、 定制、创建opener。
opener=urllib.request.build_opener(proxy_support)
3、安装opener
urllibrequestinstall_opener(opener)
4、调用opener。
opener.open(url)
代码如下
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23import urllib.request import random import time while True: url='http://www.whatismyip.com.tw' #a website that can requery the ip of your device iplist=['171.39.32.171:9999','112.245.170.47:9999','111.76.129.119:808','27.206.143.225:9999','114.138.196.144:9999'] #it shuld include the ip:port #1、创建参数字典{‘type’:'proxy ip':'port'} proxy_support=urllib.request.ProxyHandler({'http':random.choice(iplist)}) #proxy_support=urllib.request.ProxyHandler({'http':'123.163.219.132:81'}) #2、 定制、创建opener。 opener=urllib.request.build_opener(proxy_support) opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0')] #3、安装opener urllib.request.install_opener(opener) res=urllib.request.urlopen(url) html=res.read().decode('utf-8') print(html) time.sleep(5)
最后
以上就是精明冰淇淋最近收集整理的关于Python爬虫基本使用的全部内容,更多相关Python爬虫基本使用内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复