- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我用pyppeteer做了一个测试,爬虫淘宝网。淘宝网有识别代码,就是 slider 按钮,所以我在代码中添加了一些方法。但代码运行时发生了错误。错误信息如下:
2018-11-30 18:15:32 [websockets.protocol] DEBUG: client ! failing WebSocket connection in the OPEN state: 1006 [no reason] 2018-11-30 18:15:32 [websockets.protocol] DEBUG: client - event = connection_lost(None) 2018-11-30 18:15:32 [websockets.protocol] DEBUG: client - state = CLOSED 2018-11-30 18:15:32 [websockets.protocol] DEBUG: client x code = 1006, reason = [no reason] 2018-11-30 18:15:32 [websockets.protocol] DEBUG: client - aborted pending ping: 7ac33fd3 [I:pyppeteer.connection] connection closed Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False
Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False
Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False
Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False
Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False Protocol Error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. ***********************************:slide login False
...像上面的信息一样死循环。
slider 按钮滚动几次后发生错误( slider 按钮需要滚动更多次),但应该继续直到操作成功。因为我在代码中设置了重试。我想知道为什么连接关闭。
核心代码就是这样
**#middlewares.py**
from scrapy import signals
from scrapy.http import HtmlResponse
from logging import getLogger
import asyncio
import time, os
from pyppeteer.launcher import launch
from seleniumtest.moveslider import mouse_slide, input_time_random
from seleniumtest.jsflagsetter import js1, js3, js4, js5
class SeleniumMiddleware():
def __init__(self,username=None, password=None, timeout=None):
self.logger = getLogger(__name__);
self.username=username;
self.password=password;
self.timeout = timeout;
print("Init downloaderMiddleware use pypputeer.")
os.environ['PYPPETEER_CHROMIUM_REVISION'] ='588429'
# pyppeteer.DEBUG = False
print(os.environ.get('PYPPETEER_CHROMIUM_REVISION'))
loop = asyncio.get_event_loop();
task = asyncio.ensure_future(self.getbrowser());
loop.run_until_complete(task);
async def getbrowser(self):
self.browser = await launch({
'headless': False,
'userDataDir':'tmp',
'args': ['--no-sandbox'],
'executablePath': "C:\\Users\\Edwin\\AppData\\Local\\Google\\Chrome\\Application\\chrome.exe",
'dumpio':True
}
)
self.page = await self.browser.newPage();
async def usePypuppeteer(self, current_page, url):
await asyncio.sleep(0.3);
await self.page.setUserAgent(
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 \
(KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36')
await self.page.setViewport({'width': 1366, 'height': 768 });
response = await self.page.goto(url, options={'timeout': self.timeout * 1000});
if response.status != 200:
return None;
# evaluate with script
await self.page.evaluate(js1)
await self.page.evaluate(js3)
await self.page.evaluate(js4)
await self.page.evaluate(js5)
if current_page == 1:
try:
login_text = await self.page.Jeval('.qrcode-login .login-title', 'node => node.textContent');
except Exception as e:
login_text = None;
if login_text:
if login_text == '手机扫码,安全登录':
switch_btn = await self.page.querySelector('.login-switch #J_Quick2Static');
await self.page.evaluate('(element) => element.click()', switch_btn);
else:
pass;
user_edit = await self.page.querySelector('.login-text.J_UserName');
await self.page.evaluate('(element) => element.value = ""', user_edit);
await user_edit.type(self.username, {'delay': input_time_random()});
await self.page.type('#J_StandardPwd #TPL_password_1', self.password, {'delay': input_time_random()})
time.sleep(1)
slider = await self.page.Jeval('#nocaptcha', 'node => node.style')
if slider:
flag = await mouse_slide(page=self.page)
if flag:
try:
print('******************** get logging button');
login_btn = await self.page.querySelector('#J_SubmitStatic');
await self.page.evaluate('(element) => element.click()', login_btn);
await self.page.waitForSelector('#mainsrp-itemlist .m-itemlist');
await self.get_cookie(self.page);
content = await self.page.content();
return content;
except Exception as e:
return None;
else:
return None;
else:
try:
await self.page.keyboard.press('Enter') #press enter
await self.page.waitFor(20)
await self.page.waitForSelector('#mainsrp-itemlist .m-itemlist');
content = await self.page.content();
return content;
except Exception as e:
return None;
else:
try:
input = await self.page.querySelector('#mainsrp-pager div.form > input');
submit = await self.page.querySelector('#mainsrp-pager div.form > span.btn.J_Submit');
await self.page.evaluate('(element) => element.value = ""', input);
await input.type(current_page);
await submit.click();
await self.page.waitForSelector('#mainsrp-itemlist .m-itemlist');
current_page_text = await self.page.Jeval('#mainsrp-pager li.item.active > span', 'node => node.textContent');
items = await self.page.Jeval('.m-itemlist .items .item');
if current_page_text == str(current_page) and items:
content = await self.page.content();
return content;
else:
return None;
except Exception as e:
return None;
def process_request(self, request, spider):
self.logger.debug('Browser is Starting');
current_page= request.meta.get('page', 1);
loop = asyncio.get_event_loop();
task = asyncio.ensure_future(self.usePypuppeteer(current_page, request.url));
loop.run_until_complete(task);
return HtmlResponse(url=request.url, body=task.result(), encoding="utf-8",request=request, status=200);
@classmethod
def from_crawler(cls, crawler):
s = cls(username=crawler.settings.get('USERNAME'),
password=crawler.settings.get('PASSWORD'),
timeout=crawler.settings.get('TIMEOUT')
);
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
async def get_cookie(page):
res = await page.content()
cookies_list = await page.cookies()
cookies = ''
for cookie in cookies_list:
str_cookie = '{0}={1};'
str_cookie = str_cookie.format(cookie.get('name'), cookie.get('value'))
cookies += str_cookie
return cookies
def process_response(self, request, response, spider):
return response;
def process_exception(self, request, exception, spider):
pass
def spider_opened(self, spider):
spider.logger.info('Spider opened: %s' % spider.name);
**#moveslider.py**
# -*- coding:utf-8 -*-
from retrying import retry
import time, asyncio, random
def retry_if_result_none(result):
return result is None
def tries(func):
def func_wrapper(f):
async def wrapper(*args, **kwargs):
while True:
try:
if func(await f(*args, **kwargs)):
continue
else:
break
except Exception as exc:
pass
return True
return wrapper
return func_wrapper
@tries(retry_if_result_none)
async def mouse_slide(page=None):
try:
await page.hover('#nc_1_n1z') #move to slider button
await page.mouse.down() # press tee mouse
await page.mouse.move(1700, 0, {'delay': random.randint(1000, 2000)}) # move mouse to speial location
await page.mouse.up() # release mouse
except Exception as e:
print(e, '***********************************:slide login False')
slider_move_text = await page.Jeval('.errloading .nc-lang-cnt', 'node => node.textContent'); #get
print('**********************,slider_move_text=', slider_move_text);
if "哎呀,出错了,点击" in slider_move_text:
refresh_btn = await page.querySelector('.errloading .nc-lang-cnt a');
await page.evaluate('(element) => element.click()', refresh_btn);
await asyncio.sleep(3);
return None
else:
await asyncio.sleep(3)
slider_again = await page.Jeval('.nc-lang-cnt', 'node => node.textContent')
if slider_again != '验证通过':
return None
else:
await page.screenshot({'path': './headless-slide-result.png'})
return 1
def input_time_random():
return random.randint(100, 151)
**#taobao.py**
# -*- coding: utf-8 -*-
import scrapy
from scrapy import Request, Spider
from urllib.parse import quote
from seleniumtest.items import ProductItem
import json
class TaobaoSpider(scrapy.Spider):
name = 'taobao'
allowed_domains = ['www.taobao.com']
base_url = 'https://s.taobao.com/search?q='
def start_requests(self):
for keyword in self.settings.get('KEYWORDS'):
for page in range(1, self.settings.get('MAX_PAGE')+1):
url = self.base_url + quote(keyword);
yield Request(url=url, callback=self.parse, meta={'page':page}, dont_filter=True);
def parse(self, response):
products = response.xpath('//div[@id="mainsrp-itemlist"]//div[@class="items"][1]//div[contains(@class,"item")]');
for product in products:
item = ProductItem();
item['price'] = ''.join(product.xpath('.//div[contains(@class,"price")]//text()').extract()).strip();
item['title'] = ''.join(product.xpath('.//div[contains(@class,"title")]//text()').extract()).strip();
item['shop'] = ''.join(product.xpath('.//div[contains(@class,"shop")]//text()').extract()).strip();
item['image'] = ''.join(product.xpath('.//div[@class="pic"]//img[contains(@class,"img")]/@data-src').extract()).strip();
item['deal'] = product.xpath('.//div[contains(@class,"deal-cnt")]//textxt()').extract_first();
item['location'] = product.xpath('.//div[@class="location"]//text()').extract_first();
print(item['price'], item['title'], item['shop'], item['image'], item['deal'], item['location']);
yield item;
最佳答案
目前,我们有一个解决方法:
def patch_pyppeteer():
import pyppeteer.connection
original_method = pyppeteer.connection.websockets.client.connect
def new_method(*args, **kwargs):
kwargs['ping_interval'] = None
kwargs['ping_timeout'] = None
return original_method(*args, **kwargs)
pyppeteer.connection.websockets.client.connect = new_method
patch_pyppeteer()
希望这个拉取请求#160即将合并。
关于python - 代码运行时连接关闭。使用 pyppeteer 来抓取网络,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53556006/
我正在使用的网站上有一个非 Canvas 导航。关闭 Canvas 导航的默认状态是关闭的,这在移动网站上运行良好,因为您可以打开它并选择您的链接,但在桌面上关闭它并打开它会隐藏用户的信息,我希望它是
我有一个 NSViewController 是这样连接的: 在底部 viewController 中,我尝试使用 self.dismiss(self) 关闭它,但是,它会产生此错误: [General
我昨天制作了一个扩展的 JQuery 搜索框,它的作用就像一个魅力!但是,我在创建一个脚本时遇到问题,当用户单击搜索框时,它会关闭。 这是我的 JQuery: function expandSearc
我一辈子都无法在 API V3 中一次只显示一个信息窗口。我需要一个在下一次开放之前关闭。还希望在 map 上的任何地方关闭 infoWindow onclick。这是否在初始化函数中? 这是我的完整
关闭和清理套接字的正确方法是什么? 我在辅助线程中运行 io_service,我需要关闭与主线程的连接: void closeConnection() { ioc.post([&socket]
我的 Selenium 测试看起来像这样:客户选择金融产品,填写一些必要的数据,并在打印预览中显示条款/协议(protocol)文档(根据本地法律的要求)。打印/关闭打印预览对话框后,客户输入更多数据
我目前正在从 android 网站了解 Navigation Drawer,我正在使用他们的示例 http://developer.android.com/training/implementing-
尝试通过 expo 在模拟器上运行 react-native 应用程序时出现此错误。 Couldn't start project on Android: Error running adb: adb
方法一 function transform(ar) { var alStr = []; for(var i=0; i
我想按以下方式自定义我的抽屉导航: 我希望在抽屉打开时显示一个图标,在抽屉关闭时显示另一个图标,而不是将菜单图标稍微向左滑动的当前默认动画。 关于我在哪里可以找到类似内容的任何想法/线索? 我做了一些
我们刚刚从 0.6.2 或 0.7 升级了我们的 dropwizard 版本,发现 .yml 文件中的很多配置都发生了变化。尽管我们能够弄清楚其中的大部分,但我们无法弄清楚如何关闭“requestLo
从 celery 2.4.5 升级后,我开始让 celery 随机关闭。 我在 centOS 机器上使用 celery 3.0.12、boto 2.6 和 amazon sqs 和 django 1.
我试图包含一些语句来指导用户更多地了解文件无法打开或关闭的原因。文件在写入模式下无法打开的一些可能情况是什么?无法关闭怎么办? FILE *fp; if(!(fp = fopen("testing",
我有一个DLL,可以访问数据库并从存储在配置文件中的应用程序设置中读取连接字符串。然后,引用此DLL的应用程序将需要在其配置文件中为此配置设置设置值。 我遇到的问题是,生成的配置代码会通过Defaul
我将 UIDatePicker 添加为 UITextField 的输入 View UIDatePicker *oBirth; NSDateFormatter *dateFormat; _edit
我有以下代码: SecondViewController *secondView = [[SecondViewController alloc] initWithNibName:@"SecondVie
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。 想改善这个问题吗?更新问题,以便将其作为on-topic
通常,按下 option 键关闭窗口会关闭应用程序中的所有窗口。在我的应用程序中,我希望它仅关闭与用户正在关闭的窗口相关的窗口。我怎样才能做到这一点?我可以为所有窗口实现 windowShouldCl
我有一个 NSWindow,它托管一个已连接到脚本处理程序的 WebView。 现在,当用户单击 WebView 上的控件上的按钮时,它会调用我的对象上的 Objective C 方法。 在这种特定情
我想根据 MBP 上的相机使用情况自动化个人工作流程。 基本上我想知道是否任何 的摄像头(内置或 USB)已打开或关闭,因此我可以运行我将创建的程序或脚本。 我认为如果我需要轮询相机状态也可以,但基于
我是一名优秀的程序员,十分优秀!