gpt4 book ai didi

python - pandas dataframe 根据另一列值的范围插入值

转载 作者:行者123 更新时间:2023-12-01 02:59:03 25 4
gpt4 key购买 nike

我有如下所示的数据框,我想根据 sic2 列中的值插入一个“字符串”。

        conm            sic2
115466 ALLEGION PLC 34.0
115471 AGILITY HEALTH INC 80.0
115473 NORDIC AMERICAN OFFSHORE 44.0
115474 AAD 54.0
115477 DORIAN LPG LTD 44.0
115484 NOMAD FOODS LTD 20.0
115486 ATHENE HOLDING LTD 63.0
115490 MIDATECH PHARMA PLC 28.0
115495 MOTIF BIO PLC 28.0

sic2 数字到字符串中的范围如下。

1-9 Agriculture, Forestry and Fishing
10-14 Mining
15-17 Construction
18-19 not used
20-39 Manufacturing
40-49 Transportation, Communications, Electric, Gas and Sanitary service
50-51 Wholesale Trade
52-59 Retail Trade
60-67 Finance, Insurance and Real Estate
70-89 Services
91-97 Public Administration
99-99 Nonclassifiable
0 -1 Agricultural Production-Crops

如何使 pandas.DataFrame 看起来像这样应用整个大型数据集?

我尝试了几个条件代码,但总是失败。

        conm            sic2                industry
115466 ALLEGION PLC 34.0 Manufacturing
115471 AGILITY HEALTH INC 80.0 Services
115473 NORDIC AMERICAN OFFSHORE 44.0 Transportation, Communications, Electric, Gas and Sanitary service
115474 AAD 54.0 Retail Trade

最佳答案

如果将 sics 数字转换为字典,那么根据需要查找行业就相当简单了:

代码:

sic = [x.strip().split(' ', 1) for x in """
1-9 Agriculture, Forestry and Fishing
10-14 Mining
15-17 Construction
18-19 not used
20-39 Manufacturing
40-49 Transportation, Communications, ...
50-51 Wholesale Trade
52-59 Retail Trade
60-67 Finance, Insurance and Real Estate
70-89 Services
91-97 Public Administration
99-99 Nonclassifiable
""".split('\n')[1:-1]]

sic_dict = dict(sum([[(x, z) for x in
range(*[int(y) for y in v.split('-')])]
for v, z in sic], []))

测试代码:

df = pd.read_fwf(StringIO(u"""
number conm sic2
115466 ALLEGION PLC 34.0
115471 AGILITY HEALTH INC 80.0
115473 NORDIC AMERICAN OFFSHORE 44.0
115474 AAD 54.0
115477 DORIAN LPG LTD 44.0
115484 NOMAD FOODS LTD 20.0
115486 ATHENE HOLDING LTD 63.0
115490 MIDATECH PHARMA PLC 28.0
115495 MOTIF BIO PLC 28.0"""), header=1)

df['industry'] = df.sic2.apply(lambda x: sic_dict[int(x)])

print(df)

结果:

   number                      conm  sic2                             industry
0 115466 ALLEGION PLC 34.0 Manufacturing
1 115471 AGILITY HEALTH INC 80.0 Services
2 115473 NORDIC AMERICAN OFFSHORE 44.0 Transportation, Communications, ...
3 115474 AAD 54.0 Retail Trade
4 115477 DORIAN LPG LTD 44.0 Transportation, Communications, ...
5 115484 NOMAD FOODS LTD 20.0 Manufacturing
6 115486 ATHENE HOLDING LTD 63.0 Finance, Insurance and Real Estate
7 115490 MIDATECH PHARMA PLC 28.0 Manufacturing
8 115495 MOTIF BIO PLC 28.0 Manufacturing

关于python - pandas dataframe 根据另一列值的范围插入值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43991943/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com