gpt4 book ai didi

excel - 如何从网站中提取表值

转载 作者:行者123 更新时间:2023-12-02 15:52:40 47 4
gpt4 key购买 nike

https://www.morningstar.com/stocks/xnas/ATVI/price-fair-value

我正在尝试提取每年的总返回数据

28.59   13.32   0.36    -12.34  69.68   14.13   93.25   -6.04   76.18   -25.92  3.99

来自上述网站。我需要有关下一步如何进行的帮助。

Option Explicit

Sub genOP()

Dim i As Long, fI As Long
Dim tickeR As String
Dim urlStr As String
Dim ie As New InternetExplorer

With INP
fI = .Range("A" & .Rows.Count).End(xlUp).Row

For i = 2 To fI
tickeR = Trim(.Range("A" & i))
Debug.Print tickeR
'https://www.morningstar.com/stocks/xnas/abmd/price-fair-value'
urlStr = "https://www.morningstar.com/stocks/xnas/" & tickeR & "/price-fair-value"
Debug.Print urlStr
With ie
.Visible = True
.navigate urlStr

While .readyState <> 4
DoEvents
Wend
Stop
.document.getElementsByClassName("annual-data-row ng-scope").innerText

End With
Next i

End With

End Sub

最佳答案

如果您观察网络流量,您将看到该页面对该图表信息执行 API xhr 请求,并以 json 形式返回。

enter image description here

检查发送的请求,我们看到以下内容:

enter image description here

请求 URL 本身,例如

https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/0P00000053/data?secExchangeList=

字符串0P00000053是该股票代码的唯一标识符;我稍后将其称为 share_id (占位符)和 shareId 变量。它是从对实际股票页面的请求中获取的。

查询字符串参数没有值,可以忽略。重要的标题已突出显示。请参阅 key 以了解重要性的解释。

两个红色框 header 唯一标识特定的股票 URI。

两个蓝色需要 js 文件中提供的访问 key 。

您可以提出初始请求来获取这些访问 key ;获取所有股票列表的附加请求 - 我使用纳斯达克 100 作为来源;或提供您自己的股票代码列表(每个示例都给出了示例 - 根据需要注释行)。

自己的列表使用:

tickers = Array("ATVI") ''etc....extend

纳斯达克 100 指数使用:

tickers = GetNasdaqTickers(xhr, html)  

必须向每个股票代码页面发出请求,以检索唯一标识符(“X-SAL-ContentType” header 的 contentId 和 API url 的 shareId)以及 header 相应更新:

tickerName = tickers(ticker)
url = Replace$("https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value", "{ticker}", tickerName)
headersDict("Referer") = url
Set ids = GetContentIdShareId(xhr, url, re) 'Set up correct ids for each indiv request
headersDict("X-SAL-ContentType") = ids("contentId")

API 调用在股票循环期间更新,并使用 json 解析器解析出图表信息。我会使用 jsonconverter.bas来解析 json。将来自该链接的代码安装到名为 JsonConverter 的标准模块中。所有必需的项目引用都显示在代码顶部。

For ticker = LBound(tickers) To UBound(tickers)
'other code
nasdaqDict.Add tickerName, GetChartData(xhr, ids("shareId"), headersDict)
Next

函数GetChartData返回一个字典,其中图表日期作为键,图表值作为值。对于给定的代码,每个返回的字典都会添加到父字典 nasdaqDict 中。 nasdaqDict 将股票名称作为键,将关联的图表字典作为值。

最后,这个父字典被循环,所有值都由 WriteOutDict 写入工作表。

您可以探索字典中的字典,nasdaqDicthere .

<小时/>

VBA 代码:

Option Explicit

'VBE > Tools > References:
' Microsoft HTML Object Library
' Microsoft XML ,vn.0 e.g. Microsoft XML ,v6.0
' Microsoft VBScript Regular Expressions n.n e.g. Microsoft VBScript Regular Expressions 5.5
' Microsoft Scripting Runtime

Public Sub GetNasdaq100ChartValues()

Dim re As VBScript_RegExp_55.RegExp, html As mshtml.HTMLDocument, xhr As MSXML2.XMLHTTP60
Dim nasdaqDict As Scripting.Dictionary

Set re = New VBScript_RegExp_55.RegExp
Set html = New mshtml.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60

'##Set-up **************************************************************************************************

Dim headersDict As Scripting.Dictionary

Set headersDict = New Scripting.Dictionary
headersDict.Add "User-Agent", "Mozilla/5.0"
headersDict.Add "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
headersDict.Add "Referer", vbNullString
headersDict.Add "ApiKey", vbNullString
headersDict.Add "X-API-REALTIME-E", vbNullString
headersDict.Add "X-SAL-ContentType", vbNullString

Set headersDict = UpdateHeaders(xhr, re, headersDict)
Set nasdaqDict = New Scripting.Dictionary 'This will be a dictionary of dictionaries with keys as ticker names _
and values as dictionaries containing the associated chart dates as keys and values as values.

Dim ids As Scripting.Dictionary, tickerName As String, tickers(), ticker As Long, url As String

'## This gets all nasdaq tickers (from https://www.cnbc.com/nasdaq-100/) and populates tickers with these.
'## You could instead replace this with a manually supplied list of desired tickers e.g.
tickers = Array("ATVI") ''etc....extend
'tickers = GetNasdaqTickers(xhr, html) ''comment this line out if passing hardcoded ticker values

'##Get info ************************************************************************************************

For ticker = LBound(tickers) To UBound(tickers)
tickerName = tickers(ticker)
url = Replace$("https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value", "{ticker}", tickerName)
headersDict("Referer") = url
Set ids = GetContentIdShareId(xhr, url, re) 'Set up correct ids for each indiv request
headersDict("X-SAL-ContentType") = ids("contentId")
nasdaqDict.Add tickerName, GetChartData(xhr, ids("shareId"), headersDict) 'make indiv API call for current ticker
Next

WriteOutDict nasdaqDict
End Sub

Public Function UpdateHeaders(ByVal xhr As MSXML2.XMLHTTP60, ByVal re As VBScript_RegExp_55.RegExp, ByVal headersDict As Scripting.Dictionary) As Scripting.Dictionary
Dim s As String, accessKeys As VBScript_RegExp_55.MatchCollection
Dim apiKey As String, apiRealtimeKey As String

With xhr 'Make request to get keys from js file
.Open "GET", "https://www.morningstar.com/assets/quotes/1.3.0/js/sal-components-wrapper.js", False
.send
s = .responseText
End With
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "apigee:""(.*?)""|realtime:""(.*?)""" 'regex pattern to return both api key values
Set accessKeys = .Execute(s)
End With

apiKey = accessKeys.item(0).SubMatches(0)
apiRealtimeKey = accessKeys.item(1).SubMatches(1)
headersDict("ApiKey") = apiKey
headersDict("X-API-REALTIME-E") = apiRealtimeKey

Set UpdateHeaders = headersDict
End Function

Public Function GetNasdaqTickers(ByVal xhr As MSXML2.XMLHTTP60, ByVal html As HTMLDocument) As Variant
Dim tickers As Object, results(), i As Long

With xhr
.Open "GET", "https://www.cnbc.com/nasdaq-100/", False
.send
html.body.innerHTML = .responseText
End With

Set tickers = html.querySelectorAll(".quoteTable a")

ReDim results(0 To tickers.Length - 1)

For i = 0 To tickers.Length - 1
results(i) = tickers.item(i).innerText
Next
GetNasdaqTickers = results
End Function

Public Function GetContentIdShareId(ByVal xhr As MSXML2.XMLHTTP60, ByVal url As String, ByVal re As VBScript_RegExp_55.RegExp) As Scripting.Dictionary
Dim ids As Scripting.Dictionary, s As String

Set ids = New Scripting.Dictionary

With xhr 'Make request to get keys from js file
.Open "GET", url, False
.send
s = .responseText
End With
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "contentType:""(.*?)"",profile" 'regex pattern to get contentId
ids("contentId") = .Execute(s)(0).SubMatches(0)
.Pattern = "byId:{""(.*?)""" 'regex pattern to get shareId
ids("shareId") = .Execute(s)(0).SubMatches(0)
End With
Set GetContentIdShareId = ids
End Function

Public Function GetChartData(ByVal xhr As MSXML2.XMLHTTP60, ByVal shareId As String, ByVal headersDict As Scripting.Dictionary) As Scripting.Dictionary
Dim key As Variant, chartValues As Scripting.Dictionary, i As Long, json As Object
Set chartValues = New Scripting.Dictionary
With xhr
.Open "GET", Replace$("https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/{share_id}/data", "{share_id}", shareId), False
For Each key In headersDict.keys
.setRequestHeader key, headersDict(key)
Next
.send
Set json = JsonConverter.ParseJson(.responseText)
End With
For i = 1 To json("columnDefs").Count 'collection
chartValues(json("columnDefs")(i)) = json("table")("rows")(2)("datum")(i)
Next
Set GetChartData = chartValues
End Function

Public Sub WriteOutDict(ByVal nasdaqDict As Scripting.Dictionary)
Dim key As Variant, row(), r As Long, headers()

Application.ScreenUpdating = False

headers = nasdaqDict(nasdaqDict.keys(0)).keys 'assumption that charts show for same time period for all tickers

r = 2

With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = "Ticker"
.Cells(1, 2).Resize(1, UBound(headers) + 1) = headers
For Each key In nasdaqDict.keys
row = nasdaqDict(key).items
.Cells(r, 1) = key
.Cells(r, 2).Resize(1, UBound(row) + 1) = row
r = r + 1
Next
End With
Application.ScreenUpdating = True
End Sub
<小时/>

结果示例:

enter image description here

<小时/>

Python:

我最初是用 python 编写的,以防万一感兴趣:

import requests, re
from bs4 import BeautifulSoup as bs

def get_chart_data(share_id):
r = s.get(f'https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/{share_id}/data', headers=headers).json()
chart_values = dict(zip(r['columnDefs'], r['table']['rows'][1]['datum']))
return chart_values

headers = {
'User-Agent': 'Mozilla/5.0',
'Referer': '',
'ApiKey': '',
'X-API-REALTIME-E': '',
'X-SAL-ContentType': '',
}

p = re.compile(r'apigee:"(.*?)"|realtime:"(.*?)"')
p1 = re.compile(r'contentType:"(.*?)",profile')
p2 = re.compile(r'byId:{"(.*?)"')

with requests.Session() as s:
#set-up
###########################################################################
## This gets all nasdaq tickers and populates tickers with these.
r = s.get('https://www.cnbc.com/nasdaq-100/')
soup = bs(r.content, 'lxml')
tickers = [i.text for i in soup.select('.quoteTable a')]
## you could instead replace the above with a manually supplied list of desired tickers
# tickers = ['tickerA','tickerB']
##########################################################################
r = s.get('https://www.morningstar.com/assets/quotes/1.3.0/js/sal-components-wrapper.js')
access_keys = p.findall(r.text)
api_key = access_keys[0][0]
api_realtime_key = access_keys[1][1]
headers['ApiKey'] = api_key
headers['X-API-REALTIME-E'] = api_realtime_key

results = {}
#specific
for ticker in tickers:
url = f'https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value'
headers['Referer'] = url
r = s.get(url)
content_id = p1.findall(r.text)[0]
share_id = p2.findall(r.text)[0]
headers['X-SAL-ContentType'] = content_id
results[ticker] = get_chart_data(share_id)

关于excel - 如何从网站中提取表值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57447128/

47 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com