python爬取json简单吗_如何在scrapy框架下用python爬取json文件

46 阅读 0 评论 31 点赞

我是靠谱客的博主靓丽钻石，最近开发中收集的这篇文章主要介绍python爬取json简单吗_如何在scrapy框架下用python爬取json文件，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

展开全部

生成Request的时候与一e69da5e6ba903231313335323631343130323136353331333361326330般的网页是相同的，提交Request后scrapy就会下载相应的网页生成Response，这时只用解析response.body按照解析json的方法就可以提取数据了。代码示例如下(以京东为例，其中的parse_phone_price和parse_commnets是通过json提取的，省略部分代码)：

# -*- coding: utf-8 -*-

from scrapy.spiders import Spider, CrawlSpider, Rule

from scrapy.linkextractors import LinkExtractor

from jdcom.items import JdPhoneCommentItem, JdPhoneItem

from scrapy import Request

from datetime import datetime

import json

import logging

import re

logger = logging.getLogger(__name__)

class JdPhoneSpider(CrawlSpider):

name = "jdPhoneSpider"

start_urls = ["http://list.jd.com/list.html?cat=9987,653,655"]

rules = (

Rule(

LinkExtractor(allow=r"list.html?cat=9987,653,655&page=d+&trans=1&JL=6_0_0"),

callback="parse_phone_url",

follow=True,

)

def parse_phone_url(self, response):

hrefs = response.xpath("//div[@id='plist']/ul/li/div/div[@class='p-name']/a/@href").extract()

phoneIDs = []

for href in hrefs:

phoneID = href[14:-5]

phoneIDs.append(phoneID)

commentsUrl = "http://sclub.jd.com/productpage/p-%s-s-0-t-3-p-0.html" % phoneID

yield Request(commentsUrl, callback=self.parse_commnets)

def parse_phone_price(self, response):

phoneID = response.meta['phoneID']

meta = response.meta

priceStr = response.body.decode("gbk", "ignore")

priceJson = json.loads(priceStr)

price = float(priceJson[0]["p"])

meta['price'] = price

phoneUrl = "http://item.jd.com/%s.html" % phoneID

yield Request(phoneUrl, callback=self.parse_phone_info, meta=meta)

def parse_phone_info(self, response):

pass

def parse_commnets(self, response):

commentsItem = JdPhoneCommentItem()

commentsStr = response.body.decode("gbk", "ignore")

commentsJson = json.loads(commentsStr)

comments = commentsJson['comments']

for comment in comments:

commentsItem['commentId'] = comment['id']

commentsItem['guid'] = comment['guid']

commentsItem['content'] = comment['content']

commentsItem['referenceId'] = comment['referenceId']

# 2016-09-19 13:52:49 %Y-%m-%d %H:%M:%S

datetime.strptime(comment['referenceTime'], "%Y-%m-%d %H:%M:%S")

commentsItem['referenceTime'] = datetime.strptime(comment['referenceTime'], "%Y-%m-%d %H:%M:%S")

commentsItem['referenceName'] = comment['referenceName']

commentsItem['userProvince'] = comment['userProvince']

# commentsItem['userRegisterTime'] = datetime.strptime(comment['userRegisterTime'], "%Y-%m-%d %H:%M:%S")

commentsItem['userRegisterTime'] = comment.get('userRegisterTime')

commentsItem['nickname'] = comment['nickname']

commentsItem['userLevelName'] = comment['userLevelName']

commentsItem['userClientShow'] = comment['userClientShow']

commentsItem['productColor'] = comment['productColor']

# commentsItem['productSize'] = comment['productSize']

commentsItem['productSize'] = comment.get("productSize")

commentsItem['afterDays'] = int(comment['days'])

images = comment.get("images")

images_urls = ""

if images:

for image in images:

images_urls = image["imgUrl"] + ";"

commentsItem['imagesUrl'] = images_urls

yield commentsItem

commentCount = commentsJson["productCommentSummary"]["commentCount"]

goodCommentsCount = commentsJson["productCommentSummary"]["goodCount"]

goodCommentsRate = commentsJson["productCommentSummary"]["goodRate"]

generalCommentsCount = commentsJson["productCommentSummary"]["generalCount"]

generalCommentsRate = commentsJson["productCommentSummary"]["generalRate"]

poorCommentsCount = commentsJson["productCommentSummary"]["poorCount"]

poorCommentsRate = commentsJson["productCommentSummary"]["poorRate"]

phoneID = commentsJson["productCommentSummary"]["productId"]

priceUrl = "http://p.3.cn/prices/mgets?skuIds=J_%s" % phoneID

meta = {

"phoneID": phoneID,

"commentCount": commentCount,

"goodCommentsCount": goodCommentsCount,

"goodCommentsRate": goodCommentsRate,

"generalCommentsCount": generalCommentsCount,

"generalCommentsRate": generalCommentsRate,

"poorCommentsCount": poorCommentsCount,

"poorCommentsRate": poorCommentsRate,

}

yield Request(priceUrl, callback=self.parse_phone_price, meta=meta)

pageNum = commentCount / 10 + 1

for i in range(pageNum):

commentsUrl = "http://sclub.jd.com/productpage/p-%s-s-0-t-3-p-%d.html" % (phoneID, i)

yield Request(commentsUrl, callback=self.parse_commnets)

本回答被网友采纳

已赞过

已踩过<

你对这个回答的评价是？

收起

最后

以上就是靓丽钻石为你收集整理的python爬取json简单吗_如何在scrapy框架下用python爬取json文件的全部内容，希望文章能够帮你解决python爬取json简单吗_如何在scrapy框架下用python爬取json文件所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：python爬取json简单吗
浏览次数：46 次浏览
发布日期：2024-07-23 05:35:02
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_18_f0_14_z_10_y.html

python爬取json简单吗_如何在scrapy框架下用python爬取json文件

概述

最后

评论列表共有 0 条评论

发表评论取消回复

python爬取json简单吗_如何在scrapy框架下用python爬取json文件

概述

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复