gpt4 book ai didi

python - Pandas 如何在数据框列中提取整数和 float 的混合

转载 作者:行者123 更新时间:2023-12-04 03:51:31 25 4
gpt4 key购买 nike

我试过这些:https://stackoverflow.com/a/37683738/13865853 , https://stackoverflow.com/a/50830098/13865853 .

我的数据框都是字符串,但 dtype 是对象,原因是我在 SO 的其他地方读到的。

这些列是食物中微量营养素的单位,看起来像这样:

  Life-Stage Group Arsenic Boron (mg/d) Calcium (mg/d) Chromium Copper (μg/d)  \
0 <= 3.0 y nan g 3 mg 2500 mg nan g 1000 μg
1 <= 8.0 y nan g 6 mg 2500 mg nan g 3000 μg

Fluoride (mg/d) Iodine (μg/d) Iron (mg/d) Magnesium (mg/d) Manganese (mg/d) \
0 1.3 mg 200 μg 40 mg 65 mg 2 mg
1 2.2 mg 300 μg 40 mg 110 mg 3 mg

Molybdenum (μg/d) Nickel (mg/d) Phosphorus (g/d) Potassium Selenium (μg/d) \
0 300 μg 0.2 mg 3 g nan g 90 μg
1 600 μg 0.3 mg 3 g nan g 150 μg

Silicon Sulfate Vanadium (mg/d) Zinc (mg/d) Sodium Chloride (g/d) \
0 nan g nan g nan mg 7 mg nan g 2.3 g
1 nan g nan g nan mg 12 mg nan g 2.9 g

Vitamin A (μg/d) Vitamin C (mg/d) Vitamin D (μg/d) Vitamin E (mg/d) \
0 600.0 μg 400 mg 63.0 μg 200 mg
1 900.0 μg 650 mg 75.0 μg 300 mg

Vitamin K (μg/d) Thiamin (mg/d) Riboflavin (mg/d) Niacin (mg/d) \
0 nan μg nan mg nan mg 10 mg
1 nan μg nan mg nan mg 15 mg

Vitamin B6 (mg/d) Folate (μg/d) Vitamin B12 (μg/d) Pantothenic Acid (mg/d) \
0 30 mg 300 μg nan μg nan mg
1 40 mg 400 μg nan μg nan mg

Biotin (μg/d) Choline (mg/d) Carotenoids
0 nan μg 1.0 mg nan g
1 nan μg 1.0 mg nan g

我想将 nan 归零并只获取数值,因为我想将 g 乘以 1000 然后除以任何 ug ( \u03BCg(在 Python 中用于微型)增加 1000,这样所有内容都在 mg 中,这样我就可以在 Plotly Dash 中将它们绘制在条形图上。
但我坚持提取数字。以前,当我在下载数据后制作 csv 文件时,这有效,但现在无效:

# extract numbers
new_df_arr = []
for _,df in df_dict.items():
df = df.astype(str)
df_copy = df.copy()
for i in range(1, len(df.columns)):
df_copy[df.columns[i]]=df_copy[df.columns[i]].str.extract('(\d+[.]?\d*)', expand=False) #replace(r'[^0-9]+','')
new_df_arr.append(df_copy)
# check df's
for df in new_df_arr:
print(df)

最佳答案

我只使用了第一组列的输入。您可以:

  1. 遍历列并创建一系列 s,通过映射到字典 d
  2. 将单位转换为您想要相乘的值
  3. 提取数字并为每列乘以s

df = pd.DataFrame({'Life-Stage Group': {0: '<= 3.0 y', 1: '<= 8.0 y'},
'Arsenic': {0: 'nan g', 1: 'nan g'},
'Boron (mg/d)': {0: '3 mg', 1: '6 mg'},
'Calcium (mg/d)': {0: '2500 mg', 1: '2500 mg'},
'Chromium': {0: 'nan g', 1: 'nan g'},
'Copper (μg/d)': {0: '1000 μg', 1: '3000 μg'}})

d = {'μg' : .001, 'g' : 1000, 'mg' : 1}

for col in df.columns[1:]:
s = df[col].str.split(' ').str[1].map(d).astype(float)
df[col] = (df[col].str.extract('(\d+[.]?\d*)').astype(float) * s).fillna(0)
df
Out[1]:
Life-Stage Group Arsenic Boron (mg/d) Calcium (mg/d) Chromium Copper (μg/d)
0 <= 3.0 y 0.0 3.0 2500.0 0.0 1.0
1 <= 8.0 y 0.0 6.0 2500.0 0.0 3.0

关于python - Pandas 如何在数据框列中提取整数和 float 的混合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64381401/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com