gpt4 book ai didi

python - 在用于网页抓取的 Python 代码中工作一段时间后出现错误

转载 作者:太空宇宙 更新时间:2023-11-03 18:43:55 24 4
gpt4 key购买 nike

我正在尝试从 Flipkart 网站上删除所有移动数据使用python 2.7(通过idle)和漂亮的汤。下面是我的代码,在代码的第一部分中,我获取所有三星手机的所有单独链接,在第二部分中,我从这些相应页面中抓取所有移动规范(td 元素) 。但在几部手机之后,我收到以下错误消息

 ================================
>>>

Traceback (most recent call last):
File "E:\data base python\collectinghrefsamasungstack.py", line 16, in <module>
htmlfile = urllib.urlopen(url) #//.request is in 3.0x
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 345, in open_http
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 969, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 829, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 791, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 772, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 571, in create_connection
raise err
IOError: [Errno socket error] [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

我的代码

    import urllib
import re
from bs4 import BeautifulSoup

#part1
url="http://www.flipkart.com/mobiles/samsung~brand/pr?sid=tyy,4io"

regex = '<a class="fk-display-block" data-tracking-id="prd_title" href=(.+?)title' # it will find the title
pattern=re.compile(regex)

htmlfile = urllib.urlopen(url)

htmltext= htmlfile.read()
docSoup=BeautifulSoup(htmltext)
abc=docSoup.findAll('a')
c=str(abc)

count=0
#------part 2 it goes to each link and gathers the mobile specificattions
title=re.findall(pattern,c)

temp=1
file2=open('c:/Python27/samsung.txt','w')

for i in title:
print i
file2.write(i)
file2.write("\n")
count=count+1
print "\n1\n"
#print i
if temp>0 :
mob_url='http://www.flipkart.com'+i[1:len(i)-2]
htmlfile = urllib.urlopen(mob_url)
htmltext= htmlfile.read()
# htmltext
docSoup=BeautifulSoup(htmltext)

abc=docSoup.find_all('td')
file=open('c:/Python27/prut2'+str(count)+'.txt','w')
mod=0
count=count+1
pr=-1
for j in abc:
if j.text == 'Brand':
pr=3

if mod ==1:
file2.write((j).text)
file2.write("\n")
mod=0
if j.text == 'Model ID':
mod=1
#sprint j.text

if pr>0 :
file.write(j.text)
file.write('\n')

file.close
else :
temp=temp+1



print count
file2.close

我尝试禁用防病毒软件,并且我使用的网络连接非常稳定,但我仍然收到错误,所以有什么方法可以修复它吗?

最佳答案

也许您打开了太多连接。

html text= html file.read() 之后添加 html file.close()

关于python - 在用于网页抓取的 Python 代码中工作一段时间后出现错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19956097/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com