gpt4 book ai didi

python - 试图将邮政编码从一个数据帧拉到另一个地址数据帧

转载 作者:行者123 更新时间:2023-11-28 20:55:19 25 4
gpt4 key购买 nike

我有一个没有邮政编码的地址数据框:

df1 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','3 high street','5 foo street','10 foo street'],
'address2':['town1',np.nan,np.nan,'Bartown',np.nan],
'address3':[np.nan,'village','city','county2','county3']})
df1['zipcode']=''
df1

address1 address2 address3 zipcode
0 1 o'toole st town1 NaN
1 2 main st NaN village
2 3 high street NaN city
3 5 foo street Bartown county2
4 10 foo street NaN county3

我还有第二个包含地址和邮政编码的数据框。请注意,这与 df1 的顺序相同,但在我处理的真实数据中不是这样的:

df2 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','7 mill street','5 foo street','10 foo street'],
'address2':['town1','village','city','Bartown','county3'],
'address3':[np.nan,np.nan,np.nan,'county2','USA'],
'zipcode': ['er45','qw23','rt67','yu89','yu83']})
df2

address1 address2 address3 zipcode
0 1 o'toole st town1 NaN er45
1 2 main st village NaN qw23
2 7 mill street city NaN rt67
3 5 foo street Bartown county2 yu89
4 10 foo street county3 USA yu83

我想检查 df1 中的地址是否在 df2 中,如果是,则将邮政编码拖到 df1 中。

这是我遇到了一些麻烦的地方,不确定这是否是处理它的最佳方法。

到目前为止,我所做的是为两个数据帧创建一个主键,使用地址的前两行:address 1address 2,剥离所有空格和 nonalpha,转换为较低的:

df1['key'] = (df1['address1'] + df1['address2']).str.lower().str.replace(' ', '').str.replace('\W', '')


df2['key'] = (df2['address1'] + df2['address2']).str.lower().str.replace(' ', '').str.replace('\W', '')


print(df1)

address1 address2 address3 zipcode key
0 1 o'toole st town1 NaN 1otoolesttown1
1 2 main st NaN village NaN
2 3 high street NaN city NaN
3 5 foo street Bartown county2 5foostreetbartown
4 10 foo street NaN county3 NaN

print(df2)

address1 address2 address3 zipcode key
0 1 o'toole st town1 NaN er45 1otoolesttown1
1 2 main st village NaN qw23 2mainstvillage
2 7 mill street city NaN rt67 7millstreetcity
3 5 foo street Bartown county2 yu89 5foostreetbartown
4 10 foo street county3 USA yu83 10foostreetcounty3

现在我要使用 np.where 将信息拖到 df1 中的空 zipcode 列,返回 no_match 如果找不到匹配的地址:

df1['zipcode'] = np.where(df1['key'].isin(df2['key']), df2['zipcode'], 'no_match')

print(df1)

address1 address2 address3 zipcode key
0 1 o'toole st town1 NaN er45 1otoolesttown1
1 2 main st NaN village no_match NaN
2 3 high street NaN city no_match NaN
3 5 foo street Bartown county2 yu89 5foostreetbartown
4 10 foo street NaN county3 no_match NaN

我的问题是为 df1 创建的 key。如您所见,其中一些是 NaN。这是由于地址格式与 df2 不同。这就是我目前正在使用的数据集。

我试图通过跳过任何 NaN 并添加下一行来解决这个问题,但得到一个 ValueError:

# add address1 + address2 if it's not null, otherwise use address3

df1['key'] = (df1['address1'] + (df1['address2'] if pd.notnull(df1['address2']) else df1['address3']))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

非常感谢任何有关如何解决此问题的反馈或建议。如果有更简单的方法来做到这一点,我很想知道。

最佳答案

使用Series.fillnadf1['address3'] 替换缺失值:

df1['key'] = df1['address1'] + df1['address2'].fillna(df1['address3'])

改为:

df1['key'] = (df1['address1'] + (df1['address2'] if 
pd.notnull(df1['address2']) else df1['address3']))

有关您的错误的更多信息在 using if truth statements with-pandas 中.

关于python - 试图将邮政编码从一个数据帧拉到另一个地址数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57072834/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com