gpt4 book ai didi

python - Pandas 数据框,按最后一列的最后一列拆分数据,但保留其他列

转载 作者:太空宇宙 更新时间:2023-11-03 12:56:51 25 4
gpt4 key购买 nike

对 pandas 非常陌生,因此欢迎提供任何有关解决方案的解释。

我有一个数据框,例如

    Company                             Zip State City
1 *CBRE San Diego, CA 92101
4 1908 Brands Boulder, CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS
10 21st Century Healthcare, Inc. Tempe 85282
15 AAA Jefferson City, MO 65101-9564

我想将数据中的 Zip State city 列拆分为 3 个不同的列。使用这篇文章的答案 Pandas DataFrame, how do i split a column into two如果我没有第一个专栏,我可以完成这项任务。编写一个正则表达式来捕获所有公司只会让我捕获数据中的所有内容。

我也试过

foo = lambda x: pandas.Series([i for i in reversed(x.split())])
data_pretty = data['Zip State City'].apply(foo)

但这导致我松开公司列并将超过一个词的城市名称拆分到单独的列中。

如何在保留公司列数据的同时拆分最后一列?

最佳答案

你可以使用extract()方法:

In [110]: df
Out[110]:
Company Zip State City
1 *CBRE San Diego, CA 92101
4 1908 Brands Boulder, CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS
10 21st Century Healthcare, Inc. Tempe 85282
15 AAA Jefferson City, MO 65101-9564

In [112]: df[['City','State','ZIP']] = df['Zip State City'].str.extract(r'([^,\d]+)?[,]*\s*([A-Z]{2})?\s*([\d\-]{4,11})?', expand=True)

In [113]: df
Out[113]:
Company Zip State City City State ZIP
1 *CBRE San Diego, CA 92101 San Diego CA 92101
4 1908 Brands Boulder, CO 80301 Boulder CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS Fort Riley KS NaN
10 21st Century Healthcare, Inc. Tempe 85282 Tempe NaN 85282
15 AAA Jefferson City, MO 65101-9564 Jefferson City MO 65101-9564

来自 docs :

Series.str.extract(pat, flags=0, expand=None)

For each subject string in the Series, extract groups from the firstmatch of regular expression pat.

New in version 0.13.0.

Parameters:

pat : string

Regular expression pattern with capturing groups

flags : int, default 0 (no flags)

re module flags, e.g.re.IGNORECASE .. versionadded:: 0.18.0

expand : bool, default False

If True, return DataFrame.

If False, return Series/Index/DataFrame.

Returns: DataFrame with one row for each subject string, and onecolumn for each group. Any capture group names in regular expressionpat will be used for column names; otherwise capture group numberswill be used. The dtype of each result column is always object, evenwhen no match is found. If expand=True and pat has only one capturegroup, then return a Series (if subject is a Series) or Index (ifsubject is an Index).

关于python - Pandas 数据框,按最后一列的最后一列拆分数据,但保留其他列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38441831/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com