侧边栏壁纸
  • 累计撰写 123 篇文章
  • 累计创建 27 个标签
  • 累计收到 1 条评论

目 录CONTENT

文章目录

Python爬取豆瓣电影信息

梁来福
2024-05-08 / 0 评论 / 0 点赞 / 5 阅读 / 3003 字
温馨提示:
本文最后更新于 2024-05-08,若内容或图片失效,请留言反馈。部分素材来自网络,若不小心影响到您的利益,请联系我们删除。

需求

获取豆瓣电影榜上"电影名称", "副标题", "分数", "评论数", "海报"5个字段相关信息,并写入Excel表格中。

代码实现

import requests
import openpyxl

cookies = {
    'bid': 'i7S3gQS6ptw',
    'Hm_lvt_6d4a8cfea88fa457c3127e14fb5fabc2': '1713771942',
    'Hm_lpvt_6d4a8cfea88fa457c3127e14fb5fabc2': '1713771942',
    'll': '"108288"',
    '_ga': 'GA1.2.2080544340.1713771943',
    '_gid': 'GA1.2.435900186.1713771943',
    '_ga_Y4GN1R87RG': 'GS1.1.1713771942.1.0.1713771949.0.0.0',
    'ap_v': '0,6.0',
    '__utma': '30149280.2080544340.1713771943.1713771950.1713771950.1',
    '__utmb': '30149280.0.10.1713771950',
    '__utmc': '30149280',
    '__utmz': '30149280.1713771950.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
}

headers = {
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en,zh-CN;q=0.9,zh;q=0.8',
    'Connection': 'keep-alive',
    # 'Cookie': 'bid=i7S3gQS6ptw; Hm_lvt_6d4a8cfea88fa457c3127e14fb5fabc2=1713771942; Hm_lpvt_6d4a8cfea88fa457c3127e14fb5fabc2=1713771942; ll="108288"; _ga=GA1.2.2080544340.1713771943; _gid=GA1.2.435900186.1713771943; _ga_Y4GN1R87RG=GS1.1.1713771942.1.0.1713771949.0.0.0; ap_v=0,6.0; __utma=30149280.2080544340.1713771943.1713771950.1713771950.1; __utmb=30149280.0.10.1713771950; __utmc=30149280; __utmz=30149280.1713771950.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
    'Origin': 'https://movie.douban.com',
    'Referer': 'https://movie.douban.com/explore',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
    'sec-ch-ua': '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
}

response = requests.get(
    'https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%22%E7%B1%BB%E5%9E%8B%22:%22%E7%8A%AF%E7%BD%AA%22%7D&uncollect=false&tags=%E7%8A%AF%E7%BD%AA',
    cookies=cookies,
    headers=headers,
)

# 电影相关的数据
date = response.json().get("items")
movie_list = []

for item in date:
    if item.get("type") == "movie":
        # 标题
        title = item.get("title")
        # 评论数
        count = item.get("rating").get("count")
        # 分数
        value = item.get("rating").get("value")
        # 海报
        pic_normal = item.get("pic").get("normal")
        # 副标题
        card_subtitle = item.get("card_subtitle")

        movie_list.append([title, card_subtitle, value, count, pic_normal])

# 创建新工作簿
workbook = openpyxl.Workbook()
# 获取默认的工作表
sheet = workbook.active

# 写入header内容
sheet.append(["电影名称", "副标题", "分数", "评论数", "海报"])
# 写入数据
for movie in movie_list:
    sheet.append(movie)

# 保存文件
workbook.save("/Users/darby/Desktop/douban.xlsx")

0
  1. 支付宝打赏

    qrcode alipay
  2. 微信打赏

    qrcode weixin
博主关闭了所有页面的评论