gpt4 book ai didi

python - 根据逗号将一列拆分为几列

转载 作者:行者123 更新时间:2023-12-04 15:31:37 26 4
gpt4 key购买 nike

我想将一个地址列拆分为特定列,如城市和省份。

我有一个看起来像这样的数据框:

df:
+----------------------------------------------------------------------------------------------------------+
|location
+----------------------------------------------------------------------------------------------------------+
| Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Putih, Kec. Sawangan, Kota Depok, Jawa Barat 16519, Indonesia
| Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri, Bogor, Jawa Barat 16965, Indonesia
| Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. Pancoran Mas, Kota Depok, Jawa Barat 16434, Indonesia
| Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. Bandung Kulon, Kota Bandung, Jawa Barat 40211, Indonesia
| 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec. Cibiru, Kota Bandung, Jawa Barat 40614, Indonesia
+----------------------------------------------------------------------------------------------------------+

我想将其提取到另一个名为 City and Province 的列中

输出可能如下所示:

df:

+-------------+-------------------+------------+
| location | Cities | province |
+-------------+-------------------+------------+
| ..... | Kota Depok | Jawa Barat |
| ..... | Bogor | Jawa Barat |
| ..... | Kota Depok | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
| ..... | Kota Bandung | Jawa Barat |
+-------------+------------+-------------------+

我试过用这个方法:

def extract_city_state(a):
asplit = a.split(",")
city = asplit[-3].split()
state = asplit[-2].split()[0:1]
return city, state

df.join(
df['location'].apply(
lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
)
)

但它返回

---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-29-64a945be5d02> in <module>
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )

~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
4043 else:
4044 values = self.astype(object).values
-> 4045 mapped = lib.map_infer(values, f, convert=convert_dtype)
4046
4047 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-29-64a945be5d02> in <lambda>(x)
1 df.join(
2 df['location'].apply(
----> 3 lambda x: pd.Series(extract_city_state(x), index=["City", "State"])
4 )
5 )

<ipython-input-22-f1d63ccd82dc> in extract_city_state(a)
1 def extract_city_state(a):
2 asplit = a.split(",")
----> 3 city = asplit[-3].split()
4 state = asplit[-2].split()[0:1]
5 return city, state

IndexError: list index out of range

如何克服这个问题?

最佳答案

如果 str[] 索引没有选择匹配值,则仅使用 pandas str 函数避免错误 - 第一个 Series.str.split创建列表系列和最后Series.str.rsplit仅按最后一个空格拆分,因为 n=1 参数:

s = df['location'].str.split(',')

df['city'] = s.str[-3]
df['province'] = s.str[-2].str.rsplit(n=1).str[0]
print (df)
location city \
0 Jl. Raya Pasir Putih No.6, RT.1/RW.6, Pasir Pu... Kota Depok
1 Jl. Legenda Wisata, Wanaherang, Kec. Gn. Putri... Bogor
2 Jl. Blk. C7 No.17, Rangkapan Jaya Baru, Kec. P... Kota Depok
3 Jl. Cibuntu Sayuran No.12, Wr. Muncang, Kec. B... Kota Bandung
4 1 KOMP, Jl. Tirtawening No.10, Cisurupan, Kec.... Kota Bandung

province
0 Jawa Barat
1 Jawa Barat
2 Jawa Barat
3 Jawa Barat
4 Jawa Barat

关于python - 根据逗号将一列拆分为几列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61138725/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com