请求-从基于api的网站获取数据[英] requests - fetch data from api-based website

本文是小编为大家收集整理的关于请求-从基于api的网站获取数据的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我想获得 这个网站的所有评论.

一开始,我使用这个代码:

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.traveloka.com/hotel/singapore/mandarin-orchard-singapore-10602")

data = r.content
soup = BeautifulSoup(data, "html.parser")
reviews = soup.find_all("div", {"class": "reviewText"})

for i in range(len(reviews)):
    print(reviews[i].get_text())

但是这样,我只能从第一页获得评论.

有人说我可以使用 api 为此使用相同的 requests 模块.我找到了 https://api.traveloka.com/v1/hotel/hotelReviewAggregate 的 api 但我无法读取参数,因为我不知道如何使用使用 request payload 方式的 api.

所以我希望代码可以使用python或api的参数来获取所有评论,以获取所有或特定页面中特定酒店的评论.

推荐答案

在网络选项卡中查看请求负载.skip:8 和 top:8 有一部分,当您单击右箭头以获取下一页评论时,您会看到这些数字增加了 8.

您可以复制该请求并以相同的方式抓取结果

编辑:

用 chrome 打开你的页面并点击 f12.转到 Network 选项卡,在页面底部向下滚动,您可以在其中前进到下一批评论.只要您点击右箭头,就会填充网络选项卡.找到第二个 hotelReviewAggregate 并单击它.在标题选项卡下,您将找到 Request Payload.打开 data 字典并找到 skip 和 top.推进下一批评论,看看这些数字是如何变化的.您可以模拟此行为以访问其他页面.

然后您需要做的是准备您的有效负载,您可以在其中增加值并发出 GET 请求并使用 response objects 使用 BeautifulSoup 抓取数据.

请求这里

教程中的快速示例:

payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('http://httpbin.org/get', params=payload)

我不知道为什么人们决定在没有解释的情况下对我的答案给出负面评价.但是哦,好吧,如果您觉得这很有用并回答了您的问题,请接受它.

本文地址:https://www.itbaoku.cn/post/1937794.html

问题描述

I want to get all the review from this site.

at first, I use this code:

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.traveloka.com/hotel/singapore/mandarin-orchard-singapore-10602")

data = r.content
soup = BeautifulSoup(data, "html.parser")
reviews = soup.find_all("div", {"class": "reviewText"})

for i in range(len(reviews)):
    print(reviews[i].get_text())

But this way, I can only get the reviews from the first page only.

Some said I could use api for this using the same requests module. I've found the api which is https://api.traveloka.com/v1/hotel/hotelReviewAggregate but I can't read the parameter because I don't know how to use api which use request payload way.

So I'm hoping for a code to get all the review using python or the parameter of api to get the review of specific hotel in all or specific pages.

推荐答案

Look at the request payload at the network tab. There is a part where skip:8 and top:8 and you will see those numbers increment by 8 when you click on the right arrow to get the next page of reviews.

You can duplicate that request and scrape the results the same way

Edit:

Open your page with chrome and hit f12. Go to Network tab, scroll down at the bottom of your page where you can advance to the next batch of reviews. As soon as you hit the right arrow the network tab will be populated. Find the second hotelReviewAggregate and click on it. Under the headers tab you will find Request Payload. Open the data dict and find skip and top. Advance the next batch of reviews and see how those numbers change. You can simulate this behavior to get to the other pages.

Then what you need to do is to prepare your payload where you can increment the values and make GET requests and use the response objects to scrape the data with BeautifulSoup.

Requests here

Quick Example from the tutorial:

payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get('http://httpbin.org/get', params=payload)

I don't know why people decided to give a negative value to my answer without an explanation. But ohh well, If you find this useful and answers your question, please accept it.