我是靠谱客的博主 贤惠萝莉,最近开发中收集的这篇文章主要介绍python从js文件中取数据_取的脚本标签中的变量数据,Python或内容从JS加,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

I want to fetch data from another url for which I am using urllib and Beautiful Soup , My data is inside table tag (which I have figure out using Firefox console). But when I tried to fetch table using his id the result is None , Then I guess this table must be dynamically added via some js code.

I have tried all both parsers 'lxml', 'html5lib' but still I can't get that table data.

I have also tried one more thing :

web = urllib.urlopen("my url")

html = web.read()

soup = BeautifulSoup(html, 'lxml')

js = soup.find("script")

ss = js.prettify()

print ss

Result :

myPage = 'ETFs';

sectionId = 'liQuotes'; //section tab

breadCrumbId = 'qQuotes'; //page

is_dartSite = "quotes";

is_dartZone = "news";

propVar = "ETFs";

But now I don't know how I can get data of these js variables.

Now I have two options either get that table content ot get that the js variables, any one of them can fulfil my task but unfortunately I don't know how to get these , So please tell how I can get resolve any one of the problem.

Thanks

解决方案

EDIT

This will do the trick using re module to extract the data and loading it as JSON:

import urllib

import json

import re

from bs4 import BeautifulSoup

web = urllib.urlopen("http://www.nasdaq.com/quotes/nasdaq-financial-100-stocks.aspx")

soup = BeautifulSoup(web.read(), 'lxml')

data = soup.find_all("script")[19].string

p = re.compile('var table_body = (.*?);')

m = p.match(data)

stocks = json.loads(m.groups()[0])

>>> for stock in stocks:

... print stock

...

[u'ASPS', u'Altisource Portfolio Solutions S.A.', 116.96, 2.2, 1.92, 86635, u'N', u'N']

[u'AGNC', u'American Capital Agency Corp.', 23.76, 0.13, 0.55, 3184303, u'N', u'N']

.

.

.

[u'ZION', u'Zions Bancorporation', 29.79, 0.46, 1.57, 2154017, u'N', u'N']

The problem with this is that the script tag offset is hard-coded and there is not a reliable way to locate it within the page. Changes to the page could break your code.

ORIGINAL answer

Rather than try to screen scrape the data, you can download a CSV representation of the same data from http://www.nasdaq.com/quotes/nasdaq-100-stocks.aspx?render=download.

Then use the Python csv module to parse and process it. Not only is this more convenient, it will be a more resilient solution because any changes to the HTML could easily break your screen scraping code.

Otherwise, if you look at the actual HTML you will find that the data is available within the page in the following script tag:

["ADBE", "Adobe Systems Incorporated", 66.91, 1.44, 2.2, 3629837, .6, "N", "N"],

["AKAM", "Akamai Technologies, Inc.", 57.47, 1.57, 2.81, 2697834, .3, "N", "N"],

["ALXN", "Alexion Pharmaceuticals, Inc.", 170.2, 0.7, 0.41, 659817, .1, "N", "N"],

["ALTR", "Altera Corporation", 33.82, -0.06, -0.18, 1928706, .0, "N", "N"],

["AMZN", "Amazon.com, Inc.", 329.67, 6.1, 1.89, 5246300, 2.5, "N", "N"],

....

["YHOO", "Yahoo! Inc.", 35.92, 0.98, 2.8, 18705720, .9, "N", "N"]];

最后

以上就是贤惠萝莉为你收集整理的python从js文件中取数据_取的脚本标签中的变量数据,Python或内容从JS加的全部内容,希望文章能够帮你解决python从js文件中取数据_取的脚本标签中的变量数据,Python或内容从JS加所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(43)

评论列表共有 0 条评论

立即
投稿
返回
顶部