gpt4 book ai didi

python - 删除 CVS 文件中的字符以获得列的平均值

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:18:58 24 4
gpt4 key购买 nike

我刚才寻求帮助,我认为这就是我要找的东西,不幸的是我遇到了另一个问题。在我的 CSV 文件中,我用 ? 代替了 13 列中某些行中的缺失数据。我知道如何修复它,但尚未成功实现。我目前的想法是使用 ord 和 chr 来改变 ?到 0 但不确定如何实现它以列出。这是我得到的错误

File "C:\Users\David\Documents\Python\asdf.py", line 46, in <module>
iList_sum[i] += float(ill_data[i])
ValueError: could not convert string to float: '?'

只是想让你知道我不能使用 numby 或 panda。我也试图避免使用映射,因为我试图获得一个非常简单的代码。

import csv

#turn csv files into a list of lists
with open('train.csv','rU') as csvfile:
reader = csv.reader(csvfile)
csv_data = list(reader)

# Create two lists to handle the patients
# And two more lists to collect the 'sum' of the columns
# The one that needs to hold the sum 'must' have 0 so we
# can work with them more easily
iList = []
iList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]

hList = []
hList_sum = [0,0,0,0,0,0,0,0,0,0,0,0,0]

# Only use one loop to make the process mega faster
for row in csv_data:
# If row 13 is greater than 0, then place them as unhealthy
if (row and int(row[13]) > 0):
# This appends the whole 'line'/'row' for storing :)
# That's what you want (instead of saving only one cell at a time)
iList.append(row)

# If it failed the initial condition (greater than 0), then row 13
# is either less than or equal to 0. That's simply the logical outcome
else:
hList.append(row)

# Use these to verify the data and make sure we collected the right thing
# print iList
# [['67', '1', '4', '160', '286', '0', '2', '108', '1', '1.5', '2', '3', '3', '2'], ['67', '1', '4', '120', '229', '0', '2', '129', '1', '2.6', '2', '2', '7', '1']]
# print hList
# [['63', '1', '1', '145', '233', '1', '2', '150', '0', '2.3', '3', '0', '6', '0'], ['37', '1', '3', '130', '250', '0', '0', '187', '0', '3.5', '3', '0', '3', '0']]

# We can use list comprehension, but since this is a beginner task, let's go with basics:

# Loop through all the 'rows' of the ill patient
for ill_data in iList:

# Loop through the data within each row, and sum them up
for i in range(0,len(ill_data) - 1):
iList_sum[i] += float(ill_data[i])


# Now repeat the process for healthy patient
# Loop through all the 'rows' of the healthy patient
for healthy_data in hList:

# Loop through the data within each row, and sum them up
for i in range(0,len(healthy_data) - 1):
hList_sum[i] += float(ill_data[i])

# Using list comprehension, I basically go through each number
# In ill list (sum of all columns), and divide it by the lenght of iList that
# I found from the csv file. So, if there are 22 ill patients, then len(iList) will
# be 22. You can see that the whole thing is wrapped in brackets, so it would show
# as a python list

ill_avg = [ ill / len(iList) for ill in iList_sum]
hlt_avg = [ hlt / len(hList) for hlt in hList_sum]

这是 CSV 文件的屏幕截图。 CSV file

最佳答案

只需检查您从列表中获得的值:

# Loop through the data within each row, and sum them up
qmark_counter = 0
for i in range(0,len(ill_data) - 1):
if ill_data[i] == '?':
val = 0
qmark_counter += 1
else
val = ill_data[i]
iList_sum[i] += float(val)

其他的依此类推。还有许多其他可以做的改进;例如,我会将代码片段放在一个函数中,这样它就不必重复多次。

编辑:添加了问号计数器。如果您想分别跟踪每个列表的问号,您可能需要使用字典。

关于python - 删除 CVS 文件中的字符以获得列的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36586398/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com