作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我试图从印度尼西亚电子商务网站上抓取顶级产品的名称、类别和销售数量 https://shopee.co.id/top_products ,将 python 与 requests 和 BeautifulSoup 包一起使用。但是我有很多麻烦。这是我的第一次尝试:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
'cookie': '_gcl_au=1.1.961206468.1594951946; _med=refer; _fbp=fb.2.1594951949275.1940955365; SPC_IA=-1; SPC_F=y1evilme0ImdfEmNWEc08bul3d8toc33; REC_T_ID=fab983c8-c7d2-11ea-a977-ccbbfe23657a; SPC_SI=uv1y64sfvhx3w6dir503ixw89ve2ixt4; _gid=GA1.3.413262278.1594951963; SPC_U=286107140; SPC_EC=GwoQmu7TiknULYXKODlEi5vEgjawyqNcpIWQjoxjQEW2yJ3H/jsB1Pw9iCgGRGYFfAkT/Ej00ruDcf7DHjg4eNGWbCG+0uXcKb7bqLDcn+A2hEl1XMtj1FCCIES7k17xoVdYW1tGg0qaXnSz0/Uf3iaEIIk7Q9rqsnT+COWVg8Y=; csrftoken=5MdKKnZH5boQXpaAza1kOVLRFBjx1eij; welcomePkgShown=true; _ga=GA1.1.1693450966.1594951955; _dc_gtm_UA-61904553-8=1; REC_MD_30_2002454304=1595153616; _ga_SW6D8G0HXK=GS1.1.1595152099.14.1.1595153019.0; REC_MD_41_1000044=1595153318_0_50_0_49; SPC_R_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="; SPC_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_R_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="'
}
shopee_url = 'https://shopee.co.id/top_products'
response = requests.get(shopee_url, headers=headers)
response.json()
但它会引发“JSONDecodeError”,我认为这是因为我抓取的内容如下所示:view-source:https://shopee.co.id/top_products。这是我的第二次尝试:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
'cookie': '_gcl_au=1.1.961206468.1594951946; _med=refer; _fbp=fb.2.1594951949275.1940955365; SPC_IA=-1; SPC_F=y1evilme0ImdfEmNWEc08bul3d8toc33; REC_T_ID=fab983c8-c7d2-11ea-a977-ccbbfe23657a; SPC_SI=uv1y64sfvhx3w6dir503ixw89ve2ixt4; _gid=GA1.3.413262278.1594951963; SPC_U=286107140; SPC_EC=GwoQmu7TiknULYXKODlEi5vEgjawyqNcpIWQjoxjQEW2yJ3H/jsB1Pw9iCgGRGYFfAkT/Ej00ruDcf7DHjg4eNGWbCG+0uXcKb7bqLDcn+A2hEl1XMtj1FCCIES7k17xoVdYW1tGg0qaXnSz0/Uf3iaEIIk7Q9rqsnT+COWVg8Y=; csrftoken=5MdKKnZH5boQXpaAza1kOVLRFBjx1eij; welcomePkgShown=true; _ga=GA1.1.1693450966.1594951955; _dc_gtm_UA-61904553-8=1; REC_MD_30_2002454304=1595153616; _ga_SW6D8G0HXK=GS1.1.1595152099.14.1.1595153019.0; REC_MD_41_1000044=1595153318_0_50_0_49; SPC_R_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="; SPC_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_R_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="'
}
shopee_url = 'https://shopee.co.id/top_products'
response = requests.get(shopee_url, headers=headers)
soup = bs(response.text, "html.parser")
products = soup.select("._3S8sOC _2QfAXF")
print(type(products))
print(products)
但这会返回一个空列表,我不明白为什么。感谢您阅读到这里!我在之前的网络爬虫练习中没有遇到过这些问题。
最佳答案
当您看到网站为加载内容而进行的网络调用时,内容是由 javascript 调用加载的。以下脚本提供了网站上所有不同选项卡的所有数据,例如 Kouta Data Internet、Hijab Instan 等....
import requests, json
res = requests.get("https://shopee.co.id/api/v4/recommend/recommend?bundle=top_sold_product_microsite&limit=20&offset=0")
data_json = res.json()
with open("data.json","w") as f:
json.dump(data_json,f)
上面的脚本会将数据保存到一个json文件中。数据输出示例
{"data": {"update_time": 1595183508, "version": "1595183688", "sections": [{"total": 20, "key": "tspmicrosite_sec", "index": [{"data_type": "top_product", "key": "ID_V2L0_65"}, {"data_type": "top_product", "key": "ID_V2L0_3693"}, {"data_type": "top_product", "key": "ID_V2L0_2"}, {"data_type": "top_product", "key": "ID_V2L0_19"}, {"data_type": "top_product", "key": "ID_V2L0_75"}, {"data_type": "top_product", "key": "ID_V2L0_4040"}, {"data_type": "top_product", "key": "ID_V2L0_877"}, {"data_type": "top_product", "key": "ID_V2L0_15"}, {"data_type": "top_product", "key": "ID_V2L0_10"}, {"data_type": "top_product", "key": "ID_V2L0_7"}, {"data_type": "top_product", "key": "ID_V2L0_722"}, {"data_type": "top_product", "key": "ID_V2L0_285"}, {"data_type": "top_product", "key": "ID_V2L0_20"}, {"data_type": "top_product", "key": "ID_V2L0_66"}, {"data_type": "top_product", "key": "ID_V2L0_5831"}, {"data_type": "top_product", "key": "ID_V2L0_18"}, {"data_type": "top_product", "key": "ID_V2L0_16"}, {"data_type": "top_product", "key": "ID_V2L0_13"}, {"data_type": "top_product", "key": "ID_V2L0_34"}, {"data_type": "top_product", "key": "ID_V2L0_1493"}], "data": {"item": null, "keyword": null, "ads": null, "top_product": [{"info": "QUE:PTCPB,SLT:tspmicrosite_slot_00,TFS:tspmicrosite_slot_00_ID,SEC:tspmicrosite_sec_00,BND:top_sold_product_microsite,EPT:top_sold_product_microsite", "count": 2296973, "data_type": "top_product", "name": "Kuota Data Internet", "label": "ID_V2L0_65", "key": "ID_V2L0_65", "images": ["5c2b241a45c93374c154f0ef47feeb32", "8c650894988ea89258dc57604938ba9b", "b0b48c0e010c0626f9cdcecef7ba33d5"], "list": {"total": 40, "key": "ID_V2L0_65", "index": [{"data_type": "item_lite", "key": "item::122997341:2405999610"}, {"data_type": "item_lite", "key": "item::157202162:2813092958"}, {"data_type": "item_lite", "key": "item::172223406:7301485432"}, {"data_type": "item_lite", "key": "item::172223406:5801486070"}, {"data_type": "item_lite", "key": "item::57561999:1771712886"}, {"data_type": "item_lite", "key": "item::12216119:2020087641"}, {"data_type": "item_lite", "key": "item::172223406:5101486536"}, {"data_type": "item_lite", "key": "item::172223406:3810851792"}, {"data_type": "item_lite", "key": "item::6343942:61264134"}, {"data_type": "item_lite", "key":
...
...
关于python - 无法从 shopee.com 抓取最畅销的产品,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62984037/
我正在使用以下教程按选项(过滤)显示最畅销的产品以显示在 magento 的产品列表页面上? Tutorial /app/code/local/Mage/Catalog/Model/Resource/
我是一名优秀的程序员,十分优秀!