gpt4 book ai didi

python - selenium:使用 selenium 根据表单名称将数据提取到数据框

转载 作者:行者123 更新时间:2023-12-01 07:17:58 24 4
gpt4 key购买 nike

我想从 this 中提取信息站点到 pandas 数据框。这段代码:

from selenium import webdriver
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
import sys
import re
import requests

options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
#options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe')

driver.get('http://imed.med.ucm.es/epimhc/')
driver.find_element_by_css_selector('[value="mhc"]').click()
driver.find_element_by_css_selector('[value="seq"]').click()
driver.find_element_by_css_selector('[value="mhc_source"]').click()
driver.find_element_by_css_selector('[value="class"]').click()
driver.find_element_by_css_selector('[value="length"]').click()
driver.find_element_by_css_selector('[value="peptide_source"]').click()
driver.find_element_by_css_selector('[value="bind_level"]').click()
driver.find_element_by_css_selector('[value="epitope"]').click()
driver.find_element_by_css_selector('[value="epitope_level"]').click()
driver.find_element_by_css_selector('[value="reference"]').click()
driver.find_element_by_css_selector('[value="protein_name"]').click()
driver.find_element_by_css_selector('[value="protein_source"]').click()
driver.find_element_by_css_selector('[value=Search]').click()

将我带到我想要转换为 pandas 数据框的表格:

enter image description here

我的问题是如何将此页面上查看的表格转换为 pandas 数据框。

我可以看到我想要的表有一个表单 name = pepList.所以我正在尝试类似的事情:

element = driver.find_elements_by_css_selector('[name = pepList]') 

和类似的选项。即使我能够准确地识别表格,表格中的行也是一种不寻常的格式(与我习惯的格式相比):

如果有人可以演示如何将此页面上的表格提取到 pandas 数据框中,我将不胜感激。

最佳答案

点击搜索按钮后获取驱动程序。page_source使用 pandas 使用 read_html()

df=pd.read_html(driver.page_source)
print(df[1])

你的整个代码就像。

from selenium import webdriver
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
import sys
import re
import requests
import pandas as pd
options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
#options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe')


driver.get('http://imed.med.ucm.es/epimhc/')
driver.find_element_by_css_selector('[value="mhc"]').click()
driver.find_element_by_css_selector('[value="seq"]').click()
driver.find_element_by_css_selector('[value="mhc_source"]').click()
driver.find_element_by_css_selector('[value="class"]').click()
driver.find_element_by_css_selector('[value="length"]').click()
driver.find_element_by_css_selector('[value="peptide_source"]').click()
driver.find_element_by_css_selector('[value="bind_level"]').click()
driver.find_element_by_css_selector('[value="epitope"]').click()
driver.find_element_by_css_selector('[value="epitope_level"]').click()
driver.find_element_by_css_selector('[value="reference"]').click()
driver.find_element_by_css_selector('[value="protein_name"]').click()
driver.find_element_by_css_selector('[value="protein_source"]').click()
driver.find_element_by_css_selector('[value=Search]').click()
df=pd.read_html(driver.page_source)
print(df[1])

关于python - selenium:使用 selenium 根据表单名称将数据提取到数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57854844/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com