概述
人生苦短,我学python!
最近准备看看机会,看了好多的jd上,都要求会一点python,shell脚本,就在空闲的时间里面学习了一下,刚刚入门,还是一个小菜鸡,不过能写一两个小爬虫了,嘿嘿嘿
在这里给大家推荐一下我自学的网站,讲的很简单,https://www.liaoxuefeng.com/wiki/1016959663602400,那就是廖雪峰大佬的博客,好东西就是分享.我的第一语言是java,学了这点python之后,我是真觉的 人生苦短,我用python! 说的是真对.
程序员大多都是很懒,python 会让你变得更懒,好多东西都已经封装好了,因一个包就能直接用,so easy!这篇文章先来分享一个我自己写的一个爬取图片的小程序,写的很烂,命名方面和java差很多,高抬贵嘴,莫喷
# -*- coding: UTF-8 -*-
import requests, os, time, random
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
"""
爬取图片网站的demo http://www.shuaia.net/
"""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}
params = {"tagname": "美女"}
def get_pageurl(j, target_urls):
url = r"http://www.shuaia.net/e/tags/index.php?page=%d&line=25&tempid=3" % (j)
response = requests.get(url=url, headers=headers, params=params)
if response.status_code != 200:
return None
print(response.url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'lxml')
find_all = soup.find_all(class_='item-img')
for item in find_all:
target_urls.append(item.img.get('alt') + '=' + item.get('href'))
return target_urls
if __name__ == '__main__':
while True:
j = 0
target_urls = []
target_urls = get_pageurl(j, target_urls)
if None == target_urls:
continue
print(target_urls)
j = j + 1
for item in target_urls:
detail = item.split("=")
fileName = detail[0]
print(fileName)
file_name = fileName + ".jpg"
if fileName not in os.listdir():
os.makedirs(fileName)
fileUrl = detail[1]
print("下载 -》》》》" + fileName)
response_img = requests.get(fileUrl)
response_img.encoding = 'utf-8'
html = response_img.text
img_html = BeautifulSoup(html, 'lxml')
html_find = img_html.find_all('div', class_='wr-single-content-list')
img_bf_2 = BeautifulSoup(str(html_find), 'lxml')
img_url = 'http://www.shuaia.net' + img_bf_2.div.img.get('src')
urlretrieve(url=img_url, filename=fileName + '/' + file_name)
print(img_url)
url_end = ''
time.sleep(random.randint(0, 5))
fileUrl = fileUrl[0:len(fileUrl) - 5]
i = 1
while True:
url_end = '_' + str(i + 1) + '.html'
crl_file_url = fileUrl + url_end
crl_response_img = requests.get(crl_file_url)
if crl_response_img.status_code != 200:
break
crl_response_img.encoding = 'utf-8'
crl_html = crl_response_img.text
crl_img_html = BeautifulSoup(crl_html, 'lxml')
crl_html_find_1 = crl_img_html.find_all('div', class_='wr-single-content-list')
crl_img_bf_2_1 = BeautifulSoup(str(crl_html_find_1), 'lxml')
crl_img_url = 'http://www.shuaia.net' + crl_img_bf_2_1.div.img.get('src')
urlretrieve(url=crl_img_url, filename=fileName + '/' + fileName + str(i + 1) + ".jpg")
i = i + 1
time.sleep(random.randint(0, 5))
最后
以上就是文艺翅膀为你收集整理的人生苦短,我用Python-----爬取图片的全部内容,希望文章能够帮你解决人生苦短,我用Python-----爬取图片所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复