selenium+python headless访问网页

90 阅读 0 评论 60 点赞

我是靠谱客的博主甜美豆芽，这篇文章主要介绍selenium+python headless访问网页，现在分享给大家，希望可以做个参考。

最近学python爬虫，发现请求的页面如果是内容是异步加载的，则没办法用BeautifulSoup这些库爬取异步加载的数据。

selenium是自动化测试工具，可以调用浏览器加载页面数据（包括异步加载的数据），通过selenium可以很便捷爬取页面所有信息

先下载python的selenium库

pip install selenium

1、selenium+phantomjs（已夭折）

官网：http://chromedriver.storage.googleapis.com/index.html

phantomjs是一个headless的web工具，提供强大的JavaScript api，但是selenium最新版已经不支持phantomjs

如果不用selenium情况下，还是可以单独用phantomjs做数据爬取

2、selenium+Firefox

安装火狐浏览器

下载geckodriver

https://github.com/mozilla/geckodriver/releases

from selenium import webdriver
options = webdriver.FirefoxOptions()
#options.set_headless(True)
options.add_argument("--headless") #设置火狐为headless无界面模式
options.add_argument("--disable-gpu")
driver = webdriver.Firefox(firefox_options=options, executable_path="D:\开发相关\开发资料\geckodriver-v0.21.0-win64\geckodriver")
driver.get("https://s.taobao.com/search/?")
driver.get_screenshot_as_file("C:\Users\Administrator\Desktop\test.png")
driver.close()

设置firefxo为headless模式

options.add_argument("--headless")

指定firefox的设置（这里不需要指定executable_path）

driver=webdriver.Firefox(firefox_options=options)

3、selenium+chrome

安装谷歌浏览器

下载chromedriver（要根据本机chrome浏览器版本，下载对应的chromedriver版本，否则会运行出错）

http://chromedriver.storage.googleapis.com/index.html

chromedriver和chrome版本对照表

下载后解压文件

from selenium import webdriver
# from selenium.webdriver.chrome.options import Options
# chrome_options = Options()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
# chrome_options.add_argument("--disable-gpu")
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path="D:\开发chromedriver_win32\chromedriver")
driver.get("https://s.taobao.com/search/?")
driver.get_screenshot_as_file("C:\Users\Administrator\Desktop\test.png")
driver.close()

设置chrome为headless模式

options.add_argument("--headless")

指定chromedriver路径

driver = webdriver.Chrome(chrome_options=chrome_options,executable_path="D:\开发chromedriver_win32\chromedriver")

其中chrome headless运行得特别慢，不知道为啥

运行结果在桌面生成网页截图

最后

以上就是甜美豆芽最近收集整理的关于selenium+python headless访问网页的全部内容，更多相关selenium+python内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：python
浏览次数：90 次浏览
发布日期：2023-10-22 20:20:40
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_22_fz_14_j_18_1.html

python3 selenium打开Chrome报错闪退问题

uiautomatorviewer 识别android微信元素报错

$options.add_argument(r'--user-data-dir=C:\Users\name\AppData\Local\Google\Chrome\User Data') 绕过登录加载local的Chrome配置-- coding: utf-8 --driver.close()-- coding: utf-8 --from selenium import webdriverfrom time import sleepWIDTH = 320HEIGHT = 640PIXEL_RATI$