gpt4 book ai didi

python - 根据值将一列中的数据拆分为单独的列

转载 作者:行者123 更新时间:2023-11-30 23:01:07 28 4
gpt4 key购买 nike

这是我的较大脚本中的问题代码。我在“测量”列下有 5 到 7 个不同类别的数据(例如:高度、体重、BMI 等)以及相应的测量值。为了处理下游,我希望将值放在各自单独的列中。

# Import Packages
# -----------------
import re
import pandas as pd


# Sample Data Input
# -----------------
result = [
'XD59876,KEN,name="height",value="5.9",name="weight",value="180",name="Ivef",value="0.09",name="o2_saturation",value="2",name="BMI",value="27",name="heart_rate",value="66"',
'FC00187,ROW,name="height",value="5.11",name="weight",value="210"',
'AN66521,ZEN,name="Ivef",value="0.7",name="o2_saturation",value="62",name="BMI",value="26"',
'NW0098,PLO,name="height",value="6.2",name="weight",value="240",name="o2_saturation",value="2.3",name="heart_rate",value="68"',
'XD57776,KIT,name="BMI",value="32"',
'FC98763,ABC,name="Ivef",value="0.87",name="o2_saturation",value="2.67",name="heart_rate",value="68"'
]


# Output List
# -----------------
output = []


# Regular Expressions Used To Pull Measurement Values
# ---------------------------------------------------
measurement_nameRegex = r'name="([^"]+)"'
measurement_valueRegex = r'value="([^"]+)"'


# Iterate through list
# ---------------------------------------------------
for line in result:
# CSV values
key, fac, measurements = line.split(',', 2)

# Create list using regular expression
measurement_name = re.findall(measurement_nameRegex, measurements)
measurement_value = re.findall(measurement_valueRegex, measurements)

# Check to see we collect only complete data
if len(measurement_name) == len(measurement_value):

# Zip up measurement name with corresponding values & units
row = zip(measurement_name, measurement_value)
if row != []:
for index, value in enumerate(row):
output.append([key, fac, value[0], value[1]])

df = pd.DataFrame(output, columns=["Key", "Facility", "Measurement", "Value"])

# df_pivot = df.pivot_table(index=["Key", "Facility"], columns="Measurement", values="Value")

print(df)

当前输出:

        Key Facility    Measurement Value
0 XD59876 KEN height 5.9
1 XD59876 KEN weight 180
2 XD59876 KEN Ivef 0.09
3 XD59876 KEN o2_saturation 2
4 XD59876 KEN BMI 27
5 XD59876 KEN heart_rate 66
6 FC00187 ROW height 5.11

所需输出:

Key          Facility    height   weight  Ivef  o2_saturation  BMI  heart_rate
XD59876 KEN 5.9 180 0.09 2 27 66

我尝试了 Pandas pivotpivot_table 但它们进行聚合。我不想聚合任何东西。我想要的只是改变数据的组织方式。

最佳答案

纯 Pandas 解决方案:

import pandas as pd

# some sample data...
rows = [('XD59876','KEN','height','5.9'),
('XD59876','KEN','weight','0.09'),
('XD59876','KEN','o2_sat','2'),
('FC00187 ','ROW','height','5.11')]
df = pd.DataFrame(rows, columns=['Key','Facility','Measurement','Value'])

# move everything but Value to the index
df.set_index(['Key', 'Facility', 'Measurement'], inplace=True)
# convert the Measurement index to column labels
df = df.unstack('Measurement')
# get rid of 'Measurement' label in the columns index
df.columns = df.columns.droplevel()
# get rid of Value label
df.columns.name = ''
# make Key and Facility regular columns again
df.reset_index(inplace=True)

print df

输出是:

        Key Facility height o2_sat weight
0 FC00187 ROW 5.11 NaN NaN
1 XD59876 KEN 5.9 2 0.09

关于python - 根据值将一列中的数据拆分为单独的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35002871/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com