gpt4 book ai didi

python - 字典列表列表到 pandas DataFrame

转载 作者:行者123 更新时间:2023-11-28 22:14:07 33 4
gpt4 key购买 nike

我正在尝试拟合这些数据:

[['Manufacturer: Hyundai',
'Model: Tucson',
'Mileage: 258000 km',
'Registered: 07/2019'],
['Manufacturer: Mazda',
'Model: 6',
'Year: 2014',
'Registered: 07/2019']]

到 pandas DataFrame。

并非所有标签都出现在每条记录中,例如,一些记录有“里程”,而另一些则没有,反之亦然。我总共有 26 个特征,很少有项目拥有所有这些特征。

我想构建将在列中保存特征的 pandas DataFrame,如果特征不存在,则内容应为“NaN”。

我有

colnames=['Manufacturer', 'Model', 'Mileage', 'Registered', 'Year'...(all 26 features here)] 
df = pd.read_csv("./data/output.csv", sep=",", names=colnames, header=None)

很少有第一个先决条件列会给出预期的输出,但是当涉及到可选功能时,而不是缺少数据导致之后的功能出现在错误的列下。仅当所有功能都存在时,记录才能正确映射。

我忘了提到一些缺失值的特征也没有“:”但存在于列表中。所以在这两种情况下:

  • “里程”,(缺少值,但也缺少“:”)
  • 总记录中缺少“里程”

两种情况的赋值都应为“NaN”。

最佳答案

对字典列表使用嵌套列表理解并传递给 DataFrame 构造函数,如果缺少相同的键则添加 NaN:

L = [['Manufacturer: Hyundai',
'Model: Tucson',
'Mileage: 258000 km',
'Registered: 07/2019'],
['Manufacturer: Mazda',
'Model: 6',
'Year: 2014',
'Registered: 07/2019']]

df = pd.DataFrame([dict(y.split(':') for y in x) for x in L])
print (df)
Manufacturer Mileage Model Registered Year
0 Hyundai 258000 km Tucson 07/2019 NaN
1 Mazda NaN 6 07/2019 2014

编辑:您可以使用 .split(maxsplit=1) 按第一个空格拆分:

L = [['Manufacturer Hyundai',
'Model Tucson',
'Mileage 258000 km',
'Registered 07/2019'],
['Manufacturer Mazda',
'Model 6',
'Year 2014',
'Registered 07/2019']]


df = pd.DataFrame([dict(y.split(maxsplit=1) for y in x) for x in L])
print (df)

Manufacturer Mileage Model Registered Year
0 Hyundai 258000 km Tucson 07/2019 NaN
1 Mazda NaN 6 07/2019 2014

编辑:

L = [['Manufacturer  Hyundai',
'Model Tucson',
'Mileage 258000 km',
'Registered 07/2019'],
['Manufacturer Mazda',
'Model 6',
'Year 2014',
'Registered 07/2019',
'Additional equipment aaa']]

words2 = ['Additional equipment']

L1 = []
for x in L:
di = {}
for y in x:
for word in words2:
if set(word.split(maxsplit=2)[:2]) < set(y.split()):
i, j, k = y.split(maxsplit=2)
di['_'.join([i, j])] = k
else:
i, j = y.split(maxsplit=1)
di[i] = j
L1.append(di)

df = pd.DataFrame(L1)
print (df)
Additional_equipment Manufacturer Mileage Model Registered Year
0 NaN Hyundai 258000 km Tucson 07/2019 NaN
1 aaa Mazda NaN 6 07/2019 2014

关于python - 字典列表列表到 pandas DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53555487/

33 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com