gpt4 book ai didi

python - 拆分 pandas DataFrame 中的单元格并计算值

转载 作者:太空宇宙 更新时间:2023-11-03 19:54:30 25 4
gpt4 key购买 nike

我有一个 xlsx 文件,其中包含按问题排序的调查数据,如下所示:

df = pd.DataFrame({
'Question 1': ['5-6 hours', '6-7 hours', '9-10 hours'],
'Question 2': ['Very restful', 'Somewhat restful', 'Somewhat restful'],
'Question 3': ['[Home (dorm; apartment)]', '[Vehicle;None of the above; Other]', '[Campus;Home (dorm; apartment);Vehicle]'],
'Question 4': ['[Family;No one; alone]', '[Classmates; students;Family;No one; alone]', '[Family]'],
})

>>> df
Question 1 Question 2 Question 3 Question 4
5-6 hours Very restful [Home (dorm; apartment)] [Family;No one; alone]
6-7 hours Somewhat restful [Vehicle;None of the above; Other] [Classmates; students;Family;No one; alone]
9-10 hours Somewhat restful [Campus;Home (dorm; apartment);Vehicle] [Family]

对于问题 3 和 4,输入是复选框样式,允许多个答案。我如何获取特定答案选项的值计数,而不是整个单元格的值计数?

例如

Question 4
Family 3
No one; alone 2
Classmates; students 1

目前我正在这样做:

files = os.listdir()
for filename in files:
if filename.endswith(".xlsx"):
df = pd.read_excel(filename)
for column in df:
x = pd.Series(df[column].values).value_counts()
print(x)

但是,这不允许我分离具有多个答案的单元格。谢谢!

最佳答案

这可以帮助您解决问题,但我不知道如何解析您的数据。例如,如果您在问题 3 中使用分号作为分隔符,则解析后的字符串最终为 ['Home (dorm", "apartment)"]

>>> pd.Series([choice.strip() 
for choice in df['Question 4'].str[1:-1].str.split(';').sum()]
).value_counts()
Family 3
alone 2
No one 2
Classmates 1
students 1
dtype: int64

关于python - 拆分 pandas DataFrame 中的单元格并计算值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59617712/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com