gpt4 book ai didi

python - 使用 Python 抓取 Flipkart.com 产品

转载 作者:行者123 更新时间:2023-12-01 05:45:28 26 4
gpt4 key购买 nike

以下 Python 模块检查 Flipkart.com 上是否存在指定的项目:

import sys
import bs4
import re
import urllib2

def findItem(itemName):
itemName.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query= {0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(itemName)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
response = urllib2.urlopen(r)
except:
print "Internet connection error"
return
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
firstBlockSoup = soup.find('div', attrs={'class': 'size1of4 fk-medium-atom unit'})
if not firstBlockSoup:
print "Item Not Found"
return
else:
print "Item found"
return

上述模块适用于 Flipkart.com 上的部分产品,但不适用于所有产品

例如,它适用于:

findItem("galaxy s advance")

但不适用于:

findItem("Giordano Analog Watch")

如果您在 Flipkart.com 上检查上述两个产品页面的源代码(最好使用“检查元素”)并将其与代码关联起来,那么其原因就会显而易见。

有人可以建议一个简单的方法来完成任务吗?

最佳答案

如果您将其分成两个检查会怎样:

import urllib2

import bs4


def findItem(itemName):
itemName.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query= {0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(
itemName)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
response = urllib2.urlopen(r)
except:
print "Internet connection error"
return
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)

firstBlockSoup = soup.find('div', attrs={'class': 'product-unit'})
if not firstBlockSoup:
firstBlockSoup = soup.find('div', attrs={'class': 'size1of4 fk-medium-atom unit'})
if not firstBlockSoup:
print "Item Not Found"
return

print "Item found"
return


findItem("galaxy s advance")
findItem("Giordano Analog Watch")
findItem("nosuchitemfound")

打印:

Item found
Item found
Item Not Found

另一种方法是检查“无结果页面”是否存在。例如,只需检查 soup.text 中的“找到 0 个结果”

关于python - 使用 Python 抓取 Flipkart.com 产品,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16264414/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com