gpt4 book ai didi

python - 从记录列表创建数据框

转载 作者:行者123 更新时间:2023-11-30 21:52:22 24 4
gpt4 key购买 nike

此代码的目的是打开一个包含多页表格的网页,脚本必须抓取整个表格并最终将其转换为 pandas 数据框。

一切都很顺利,直到数据框部分。

当我尝试在将其转换为数据帧之前打印它时,它为我提供了每个原始数据作为列表,如下所示:

['Release Date', 'Time', 'Actual', 'Forecast', 'Previous', '']
['Jan 27, 2020', '00:30', ' ', ' ', '47.8%', '']
['Jan 20, 2020', '00:30', '47.8%', ' ', '43.0%', '']
['Jan 13, 2020', '00:30', '43.0%', ' ', '31.5%', '']
['Jan 07, 2020', '00:30', '31.5%', ' ', '29.9%', '']

当我尝试将其转换为数据帧时,它给了我这个:

0     1     2     3     4     5     6     7     8     9    10    11
0 A p r 0 6 , 2 0 1 4
1 0 5 : 0 0 None None None None None None None
2 4 0 . 3 % None None None None None None None
3 None None None None None None None None None None None
4 None None None None None None None None None None None

这是代码:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome(r"D:\Projects\Driver\chromedriver.exe")
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
try:
item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except TimeoutException:
break
for table in wait.until(
ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

最佳答案

您没有读取行中的数据。您的代码只需要进行微小的更改:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'

driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 10)

while True:
try:
item = wait.until(ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except TimeoutException:
break
data = []
for table in wait.until(
ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
line = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
data.append(line)
df = pd.DataFrame.from_records(data)
print(df.head())

driver.quit()

输出:

0  Release Date   Time  Actual  Forecast  Previous
1 Jan 27, 2020 00:30 47.8%
2 Jan 20, 2020 00:30 47.8% 43.0%
3 Jan 13, 2020 00:30 43.0% 31.5%
4 Jan 07, 2020 00:30 31.5% 29.9%

关于python - 从记录列表创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59903046/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com