gpt4 book ai didi

python - 从文本中提取年龄值以在 Pandas 中创建新列

转载 作者:太空宇宙 更新时间:2023-11-04 02:32:28 25 4
gpt4 key购买 nike

我有如下数据集:

df=pd.DataFrame([["Sam is 5", 2000],["John is 3 years and 6 months",1200],["Jack is 4.5 years",7000],["Shane is 25 years old",2000]], columns = ['texts','amount'])

print(df)

texts amount
0 Sam is 5 2000
1 John is 3 years and 6 months 1200
2 Jack is 4.5 years 7000
3 Shane is 25 years old 2000

我想从 df['texts'] 中提取年龄值并用它来计算新列 df['value']

df['value'] = df['amount'] / val 

其中 val 是来自 df['texts'] 的数值

这是我的代码

val = df['texts'].str.extract('(\d+\.?\d*)', expand=False).astype(float)
df['value'] = df['amount']/val
print(df)

输出:

    texts                          amount     value
0 Sam is 5 2000 400.000000
1 John is 3 years and 6 months 1200 400.000000
2 Jack is 4.5 years 7000 1555.555556
3 Shane is 25 years old 2000 80.000000

预期输出:

    texts                          amount     value
0 Sam is 5 2000 400.000000
1 John is 3 years and 6 months 1200 342.85
2 Jack is 4.5 years 7000 1555.555556
3 Shane is 25 years old 2000 80.000000

上面代码中的问题是我无法弄清楚如何将 3 年 6 个月转换为 3.5 年。

附加信息:文本列仅包含按年和月顺序排列的年龄值。

欢迎提出任何建议。谢谢

最佳答案

我相信你需要:

注意:如果没有年份和月份文本,则解决方案以年份计算

#extract all first numbers
a = df['texts'].str.extract('(\d+\.?\d*)', expand=False).astype(float)
#extract years only
b = df['texts'].str.extract('(\d+\.?\d*)\s+years', expand=False).astype(float)
#replace NaNs by a
y = b.combine_first(a)
print(y)
0 5.0
1 3.0
2 4.5
3 25.0
Name: texts, dtype: float64

#extract months only
m = df['texts'].str.extract('(\d+\.?\d*)\s+months', expand=False).astype(float) / 12
print (m)
0 NaN
1 0.5
2 NaN
3 NaN
Name: texts, dtype: float64

#add together
val = y.add(m, fill_value=0)
print (val)
0 5.0
1 3.5
2 4.5
3 25.0
Name: texts, dtype: float64

df['value'] = df['amount']/val
print (df)
texts amount value
0 Sam is 5 2000 400.000000
1 John is 3 years and 6 months 1200 342.857143
2 Jack is 4.5 years 7000 1555.555556
3 Shane is 25 years old 2000 80.000000

关于python - 从文本中提取年龄值以在 Pandas 中创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48860897/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com