- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试 刮 本站数据https://quickfs.net/company/BABA:US使用 pyppeteer,没有这个网站就会知道我在抓取。
所以我的第一个问题是:
import pyppeteer
import asyncio
async def main():
# launches a chromium browser, can use chrome instead of chromium as well.
browser = await pyppeteer.launch(headless=False)
# creates a blank page
page = await browser.newPage()
# follows to the requested page and runs the dynamic code on the site.
await page.goto("https://api.quickfs.net/stocks/BABA:US/ovr/Annual/")
# provides the html content of the page
cont = await page.content()
return cont
# prints the html code
print(asyncio.get_event_loop().run_until_complete(main()))
ovr=(asyncio.get_event_loop().run_until_complete(main()))
提前致谢
最佳答案
问题 1:使用 pyppeteer 进行抓取我不会被(网站)注意到进行抓取是否正确?
简单回答:是的。这个网站使用的是javascript,所以你需要一个像pyppeteer这样的东西来呈现网页。使用 pyppeteer 也会模拟你是一个普通用户。所以被发现的机会少。
技术答案:这需要更多的网络抓取经验,但如果您查看正在调用的请求。该网站使用 API 来呈现数据。因此,使用适当的方法和 header 向 API 发出请求以避免被检测到会更有效。
GET https://api.quickfs.net/stocks/BABA:US/ovr/Annual/
{"datasets":{"metadata":{"_id":{},"qfs_symbol":"NYSE:BABA","currency":"USD","fsCat":"normal","name":"Alibaba Group Holding Limited","gs3_version_at_metadata_update":20191106,"exchange":"NYSE","industry":"Retailing","symbol":"BABA","country":"US","price":215.7,"p_pretax_inc":"24.9","ps":"8.1","ev_ebit":"42.5","ev_fcf":"21.7","ev_s":"7.7","ev_ebitda":"37.2","pb":"4.7","mkt_cap":588375,"pe":"27.7","ev_pretax_inc":"23.6","ev":558430,"qfs_symbol_v2":"BABA:US","description":"","avg_vol_50d":19498671,"beta":1.8212,"betaLastUpdated":20200419,"share_turnover":"180","sector":"Consumer Discretionary","template_version":4,"gics":"25502020","template_type":"normal"},"ks":"\n\t\t <div class=\"ksTblBg\">\n\t\t <table class=\"ksTbl\">\n\t\t <thead>\n\t\t <tr>\n\t\t <th colspan=\"6\" style=\"text-align:center\">Key Statistics<\/th>\n\t\t <\/tr>\n\t\t <\/thead>\n\t\t <tbody>\n\t\t \n\t\t <tr>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">Valuation Ratios<\/td>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">10-Yr Median Returns<\/td>\n\t\t <td class=\"ksSectHead\" colspan=\"2\">10-Yr Median Margins<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/E<\/td><td class='rt' id='ks-pe'><\/td>\n\t\t <td class='lt'>ROA<\/td><td class='rt'>13.0%<\/td>\n\t\t <td class='lt'>Gross Profit<\/td><td class='rt'>66.7%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/B<\/td><td class='rt' id='ks-pb'><\/td>\n\t\t <td class='lt'>ROE<\/td><td class='rt'>22.3%<\/td>\n\t\t <td class='lt'>EBIT<\/td><td class='rt'>28.9%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>P\/S<\/td><td class='rt' id='ks-ps'><\/td>\n\t\t <td class='lt'>ROIC<\/td><td class='rt'>30.4%<\/td>\n\t\t <td class='lt'>Pre-Tax Income<\/td><td class='rt'>35.6%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/S<\/td><td class='rt' id='ks-ev_s'><\/td>\n\t\t <td class='ksSectHead' colspan='2'>10-Year CAGR<\/td>\n\t\t <td class='lt'>FCF<\/td><td class='rt'>40.8%<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/EBITDA<\/td><td class='rt' id='ks-ev_ebitda'><\/td>\n\t\t <td class='lt'>Revenue<\/td><td class='rt'>56.3%<\/td>\n\t\t <td class='ksSectHead' colspan='2'>Capital Structure<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/EBIT<\/td><td class='rt' id='ks-ev_ebit'><\/td>\n\t\t <td class='lt'>Assets<\/td><td class='rt'>58.2%<\/td>\n\t\t <td class='lt'>Assets \/ Equity<\/td><td class='rt'>1.6<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/Pretax<\/td><td class='rt' id='ks-ev_pretax_income'><\/td>\n\t\t <td class='lt'>FCF<\/td><td class='rt'>51.1%<\/td>\n\t\t <td class='lt'>Debt \/ Equity<\/td><td class='rt'>0.3<\/td>\n\t\t <\/tr>\n\t\t <tr>\n\t\t <td class='lt'>EV\/FCF<\/td><td class='rt' id='ks-ev_fcf'><\/td>\n\t\t <td class='lt'>EPS<\/td><td class='rt'>68.6%<\/td>\n\t\t <td class='lt'>Debt \/ Assets<\/td><td class='rt'>0.2<\/td>\n\t\t <\/tr>\n\t\t \n\t\t <\/tbody>\n\t\t <\/table>\n\t\t <\/div>","ovr":"<table class='fs-table' id='ovr-table'>\n <tbody>\n <tr class='thead'><td><\/td><td>2011<\/td><td>2012<\/td><td>2013<\/td><td>2014<\/td><td>2015<\/td><td>2016<\/td><td>2017<\/td><td>2018<\/td><td>2019<\/td><td>2020<\/td><\/tr><tr class=' '><td class='labelCell'>Revenue<\/td><td class='dataCell' data-type='normal' data-value='1010821000'>1,011<\/td><td class='dataCell' data-type='normal' data-value='3172277000'>3,172<\/td><td class='dataCell' data-type='normal' data-value='5553464000'>5,553<\/td><td class='dataCell' data-type='normal' data-value='8505565000'>8,506<\/td><td class='dataCell' data-type='normal' data-value='12214920000'>12,215<\/td><td class='dataCell' data-type='normal' data-value='15554001000'>15,554<\/td><td class='dataCell' data-type='normal' data-value='22958079000'>22,958<\/td><td class='dataCell' data-type='normal' data-value='39615348000'>39,615<\/td><td class='dataCell' data-type='normal' data-value='56145652000'>56,146<\/td><td class='dataCell' data-type='normal' data-value='72603233000'>72,603<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Revenue Growth<\/td><td class='dataCell italic' data-type='percentage' data-value='0.20945600737049'>20.9%<\/td><td class='dataCell italic' data-type='percentage' data-value='2.1383172688339'>213.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.75062392092494'>75.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.53157830860162'>53.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.43610918263513'>43.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.27336085705023'>27.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.47602401465706'>47.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.72555151500263'>72.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.41727019537983'>41.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.29312298305842'>29.3%<\/td><\/tr><tr class=' '><td class='labelCell'>Gross Profit<\/td><td class='dataCell' data-type='normal' data-value='812343000'>812<\/td><td class='dataCell' data-type='normal' data-value='2134020000'>2,134<\/td><td class='dataCell' data-type='normal' data-value='3989767000'>3,990<\/td><td class='dataCell' data-type='normal' data-value='6339808000'>6,340<\/td><td class='dataCell' data-type='normal' data-value='8394512000'>8,395<\/td><td class='dataCell' data-type='normal' data-value='10270811000'>10,271<\/td><td class='dataCell' data-type='normal' data-value='14329852000'>14,330<\/td><td class='dataCell' data-type='normal' data-value='22671036000'>22,671<\/td><td class='dataCell' data-type='normal' data-value='25315484000'>25,315<\/td><td class='dataCell' data-type='normal' data-value='32382879000'>32,383<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Gross Margin %<\/td><td class='dataCell italic' data-type='percentage' data-value='0.80364673864116'>80.4%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.67270922432057'>67.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.71842853397447'>71.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.74537176542652'>74.5%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.68723430034744'>68.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.66033241221985'>66.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.62417469684637'>62.4%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.57227910758224'>57.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.45088948294696'>45.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.44602530303299'>44.6%<\/td><\/tr><tr class=' '><td class='labelCell'>Operating Profit<\/td><td class='dataCell' data-type='normal' data-value='266271000'>266<\/td><td class='dataCell' data-type='normal' data-value='847525000'>848<\/td><td class='dataCell' data-type='normal' data-value='1820317000'>1,820<\/td><td class='dataCell' data-type='normal' data-value='4084952000'>4,085<\/td><td class='dataCell' data-type='normal' data-value='3736415000'>3,736<\/td><td class='dataCell' data-type='normal' data-value='4607009000'>4,607<\/td><td class='dataCell' data-type='normal' data-value='7035973000'>7,036<\/td><td class='dataCell' data-type='normal' data-value='11137968000'>11,138<\/td><td class='dataCell' data-type='normal' data-value='8604121000'>8,604<\/td><td class='dataCell' data-type='normal' data-value='13105334000'>13,105<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>Operating Margin %<\/td><td class='dataCell italic' data-type='percentage' data-value='0.26342052648293'>26.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.267166139653'>26.7%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.32778046278863'>32.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.48026815384986'>48.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.30588943685264'>30.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.29619446469111'>29.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.30647045861285'>30.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.28115285015293'>28.1%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.15324643482633'>15.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.18050620417964'>18.1%<\/td><\/tr><tr class=' '><td class='labelCell'>Earnings Per Share<\/td><td class='dataCell' data-type='eps' data-value='0.053'>$0.05<\/td><td class='dataCell' data-type='eps' data-value='0.287'>$0.29<\/td><td class='dataCell' data-type='eps' data-value='0.574'>$0.57<\/td><td class='dataCell' data-type='eps' data-value='1.62'>$1.62<\/td><td class='dataCell' data-type='eps' data-value='1.555'>$1.56<\/td><td class='dataCell' data-type='eps' data-value='4.289'>$4.29<\/td><td class='dataCell' data-type='eps' data-value='2.462'>$2.46<\/td><td class='dataCell' data-type='eps' data-value='3.88'>$3.88<\/td><td class='dataCell' data-type='eps' data-value='4.973'>$4.97<\/td><td class='dataCell' data-type='eps' data-value='7.965'>$7.97<\/td><\/tr><tr class=' '><td class='labelCell italic indent'>EPS Growth<\/td><td class='dataCell italic' data-type='percentage' data-value='0.23255813953488'>23.3%<\/td><td class='dataCell italic' data-type='percentage' data-value='4.4150943396226'>441.5%<\/td><td class='dataCell italic' data-type='percentage' data-value='1'>100.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='1.8222996515679'>182.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='-0.040123456790124'>-4.0%<\/td><td class='dataCell italic' data-type='percentage' data-value='1.7581993569132'>175.8%<\/td><td class='dataCell italic' data-type='percentage' data-value='-0.42597342037771'>-42.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.57595450852965'>57.6%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.28170103092784'>28.2%<\/td><td class='dataCell italic' data-type='percentage' data-value='0.60164890408204'>60.2%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Assets<\/td><td class='dataCell' data-type='percentage' data-value='0.12490081137912'>12.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.13547055438638'>13.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.15474766176667'>15.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.26661125490522'>26.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.13179228026259'>13.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.22667998521059'>22.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.097819038194011'>9.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.10848994208585'>10.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.10177987302833'>10.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.12868657986734'>12.9%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Equity<\/td><td class='dataCell' data-type='percentage' data-value='0.26227533616942'>26.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.20200237445123'>20.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.38077278024081'>38.1%<\/td><td class='dataCell' data-type='percentage' data-value='0.90392646328004'>90.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.24438521190488'>24.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.34553804941983'>34.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.14914185483796'>14.9%<\/td><td class='dataCell' data-type='percentage' data-value='0.17542701201745'>17.5%<\/td><td class='dataCell' data-type='percentage' data-value='0.16392435911507'>16.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.19830371476362'>19.8%<\/td><\/tr><tr class=' '><td class='labelCell'>Return on Invested Capital<\/td><td class='dataCell' data-type='percentage' data-value='0.41743100812616'>41.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.31146385668929'>31.1%<\/td><td class='dataCell' data-type='percentage' data-value='0.56166392937543'>56.2%<\/td><td class='dataCell' data-type='percentage' data-value='0.79357545168436'>79.4%<\/td><td class='dataCell' data-type='percentage' data-value='0.29563665163366'>29.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.40666624726852'>40.7%<\/td><td class='dataCell' data-type='percentage' data-value='0.15645567128128'>15.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.17835726067885'>17.8%<\/td><td class='dataCell' data-type='percentage' data-value='0.15560704472355'>15.6%<\/td><td class='dataCell' data-type='percentage' data-value='0.20185124127701'>20.2%<\/td><\/tr><\/tbody><\/table>","chart":[["2006-12",0],["2007-12",-2.7333985391131],["2008-12",1.4806594382205],["2009-12",0.44823138109063],["2010-12",0.57515717254689],["2011-12",0.41743100812616],["2012-03",0.31146385668929],["2013-03",0.56166392937543],["2014-03",0.79357545168436],["2015-03",0.29563665163366],["2016-03",0.40666624726852],["2017-03",0.15645567128128],["2018-03",0.17835726067885],["2019-03",0.15560704472355],["2020-03",0.20185124127701]]},"errors":[],"code":0,"qfs_symbol_v2":"BABA:US","statementPeriod":"Annual"}
问题 2:我应该以某种方式模拟使用 pyppeteer 选择的关键比率吗?
# select the button for Key Ratios
await page.select('body > app-root > app-company > div > div > div.pageHead > div > div:nth-child(3) > div.col-xs-offset-3.col-xs-2 > select-fs-dropdown > div > button > div')
您应该能够阅读
documentation for pyppeteer 以更好地了解如何实际执行此操作。
# get data for Alibaba
https://api.quickfs.net/stocks/BABA:US/ovr/Annual/
# get data for Tesla
https://api.quickfs.net/stocks/TSLA:US/ovr/Annual/
# get data for Apple
https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/
然后你可以简单地使用请求调用 Python 中的 API:
import requests
resp = requests.get("https://api.quickfs.net/stocks/AAPL:US/ovr/Annual/")
data = resp.json
关于python - 使用 pyppeteer 抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62429010/
我想使用身份验证代理运行 Chrome 浏览器。我有这个代码,但 Chrome 不通过代理连接。请问有什么建议吗? import asyncio from pyppeteer import launc
我正在尝试接受在此 page 上生成的弹出窗口上的 cookie 同意。 。我尝试使用 waitForSelector 但我使用的选择器似乎对 headless 浏览器不可见。我想实际上切换到"is"
我在做一些测试,我想知道下面的脚本是否在异步运行? # python test.py It took 1.3439464569091797 seconds. 31(站点)x 1.34 = 41.54
Pyppeteer简介 介绍Pyppeteer之前先说一下Puppeteer,Puppeteer是谷歌出品的一款基于Node.js开发的一款工具,主要是用来操纵Chrome浏览器的 API,通过J
我正在尝试 刮 本站数据https://quickfs.net/company/BABA:US使用 pyppeteer,没有这个网站就会知道我在抓取。 所以我的第一个问题是: 将 pyppeteer
因此,如果我使用 await page.waitFor(9000) 或一些硬编码的等待号码,我的函数将等到页面加载。 但是,await page.goto(url, {'waitUntil': 'ne
我最近在 heroku 中部署了一个应用程序。它使用 python pyppeteer 包。我在 repl.it 上测试时没有遇到任何问题。但不幸的是,在 heroku 中,浏览器不断崩溃。 我使用
我想使用 pyppeteer 连接到现有的(已由用户打开,没有任何额外标志)Chrome 浏览器这样我就可以控制它了。 我之前几乎可以执行所有手动操作(例如,在现有 chrome 中启用远程 Debu
我正在尝试将使用 puppeteer 的节点项目迁移到使用 pyppeteer 的 python 项目。 我有下面的 javascript 查询,它工作正常。 const values = await
我用pyppeteer做了一个测试,爬虫淘宝网。淘宝网有识别代码,就是 slider 按钮,所以我在代码中添加了一些方法。但代码运行时发生了错误。错误信息如下: 2018-11-30 18:15:32
我用 python 结合 pyppeteer 编写了一个脚本,用于从网页上抓取不同咖啡馆的名称及其电话号码。虽然我下面尝试的方法达到了目的,但脚本看起来确实很困惑。使用 pyppeteer 库创建 f
在Python中,使用pyppeteer,我打开一个网页并在其控制台中运行JS脚本并尝试捕获结果在变量中,但我收到以下错误。 Traceback (most recent call last):
我有两个问题暂时无法解决。 1. 我想让浏览器保持运行状态,这样我就可以使用 pyppeteer.launcher.connect() 重新连接函数,但即使我不调用 pyppeteer.browser
今天,我学习了名为 pyppeteer 的库,当我运行我的代码时 import asyncio from pyppeteer import launch async def main(): b
我想使用pyppeteer单击以下按钮 Text here 我正在尝试使用 Jquery 来完成此操作,如建议的 here : btn = await page.querySelector('butt
我用 python 结合 pyppeteer 和 asyncio 编写了一个脚本,从其登陆页面抓取不同帖子的链接,并最终获得每个帖子的标题通过跟踪通向其内页的 url 来发布。我这里解析的内容不是动态
我正在尝试登录一个网站,单击一个按钮,然后抓取一些数据。必须呈现页面,因为它全部使用 JavaScript(因此如果您 [例如] 在 Web 浏览器中查看源代码,则不可用)。 除了发送点击的时间外,一
我在 AWS Lambda 中遇到了这个错误。似乎 devtools websocket 没有启动。不知道如何修复它。有任何想法吗?谢谢你的时间。 由于 websocket 响应超时 https://
如何在 Puppeteer 中禁用图像/CSS? 我看过这个教程 https://www.scrapehero.com/how-to-increase-web-scraping-speed-using
我想抓取一个网站,但在使用 Recaptcha 时遇到困难。我已经找到了解决该问题的方法,但在该方法开始之前,我必须确保 Recaptcha 已完全加载,这就是我所坚持的。我试过page.waitFo
我是一名优秀的程序员,十分优秀!