使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB

80 阅读 0 评论 53 点赞

我是靠谱客的博主超帅月亮，最近开发中收集的这篇文章主要介绍使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

一.scrapy startproject 项目名；并进入项目目录；建立爬虫：scrapy genspider 爬虫名爬取域名

二.在pycharm中进行编程

1.item文件的编写：需要获取标题，电影演职员信息，评分，简介

import scrapy
class MongotestItem(scrapy.Item):
# define the fields for your item here like:

# name = scrapy.Field()

title=scrapy.Field()
info=scrapy.Field()
content=scrapy.Field()
scores=scrapy.Field()

2.编写爬虫文件

import scrapy
from mongotest.items import MongotestItem
class Test1Spider(scrapy.Spider):
name = 'test1'

allowed_domains = ['movie.douban.com']
off_set=0

url="https://movie.douban.com/top250?start="

start_urls = [url+str(off_set)+"&filter="]
def parse(self, response):
item=MongotestItem()
info_list=response.xpath('//div[@class="info"]')
for info_item in info_list:
item=MongotestItem()
title=info_item.xpath('.//span[@class="title"][1]/text()').extract()[0]
info=info_item.xpath('.//div[@class="bd"]/p/text()').extract()
info="".join(info).strip()
scores=info_item.xpath('.//span[@class="rating_num"]/text()').extract()[0]
content=info_item.xpath('.//span[@class="inq"]/text()').extract()
if len(content)>0:
content=content[0].strip()
item['title']=title
item['info']=info
item['scores']=scores
item['content']=content
yield item
if self.off_set<225:
self.off_set+=25

next_url=self.url+str(self.off_set)
yield scrapy.Request(next_url,callback=self.parse)

3.设置setting文件

在setting文件中，设置mongoDB的数据库的设置，setting中的其他设置和之前爬取其他页面的设置一样

MONGO_HOST='127.0.0.1'
MONGO_PORT=27017
MONGO_DBNAME="douban"
MONGO_SHEETNAME="movieinfo"

4.设置pipeline

import json
import pymongo
from mongotest.settings import MONGO_HOST,MONGO_PORT,MONGO_DBNAME,MONGO_SHEETNAME
class MongotestPipeline(object):
def __init__(self):
print('MongotestPipeline__init__')
host=MONGO_HOST
port=MONGO_PORT
dbname=MONGO_DBNAME
sheetname=MONGO_SHEETNAME
#

client=pymongo.MongoClient(host=host,port=port)
mydb=client[dbname]
self.post=mydb[sheetname]
def process_item(self, item, spider):
dict_item=dict(item)
self.post.insert(dict_item)
return item

5.运行项目

from scrapy import cmdline
cmdline.execute("scrapy crawl test1".split())

6.查看数据库就可以看到数据已经存储进去了

最后

以上就是超帅月亮为你收集整理的使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB的全部内容，希望文章能够帮你解决使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：网络爬虫
浏览次数：80 次浏览
发布日期：2024-01-15 16:20:50
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_ogf3_12__7__18_4.html

使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB

概述

最后

评论列表共有 0 条评论

发表评论取消回复

使用scrapy爬取豆瓣电影排行top250的电影，并存入mongoDB

概述

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复