gpt4 book ai didi

python - 剥离数据框单元格然后创建列

转载 作者:行者123 更新时间:2023-12-01 01:33:11 25 4
gpt4 key购买 nike

我正在尝试从数据帧中获取信息并将其分解为具有以下标题名称的列。信息全部塞进 1 个单元格中。

Python 新手,所以要温柔。

感谢您的帮助

我的代码:

r=requests.get('https://nclbgc.org/search/licenseDetails?licenseNumber=80479')

page_data = soup(r.text, 'html.parser')
company_info = [' '.join(' '.join(info.get_text(", ", strip=True).split()) for info in page_data.find_all('tr'))]
df = pd.DataFrame(company_info, columns = ['ic_number, status, renewal_date, company_name, address, county, telephon, limitation, residential_qualifiers'])


print(df)

我得到的结果:

['License Number, 80479 Status, Valid Renewal Date, n/a  Name, DLR Construction, LLC Address, 3217 Vagabond Dr Monroe, N
C 28110 County, Union Telephone, (980) 245-0867 Limitation, Limited Classifications, Residential Qualifiers, Arteaga, Vi
cky Rodriguez']

最佳答案

您可以使用read_html进行一些后期处理:

url = 'https://nclbgc.org/search/licenseDetails?licenseNumber=80479'

#select first table form list of tables, remove only NaNs rows
df = pd.read_html(url)[0].dropna(how='all')
#forward fill NaNs in first column
df[0] = df[0].ffill()
#merge values in second column
df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T

print (df)
Address Classifications County License Number \
1 3217 Vagabond Dr Monroe, NC 28110 Residential Union 80479

Limitation Name Qualifiers Renewal Date \
1 Limited DLR Construction, LLC Arteaga, Vicky Rodriguez

Status Telephone
1 Valid (980) 245-0867

关于python - 剥离数据框单元格然后创建列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52624286/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com