gpt4 book ai didi

python - 如何在 Python 中编写一个在 20 个不同的 csv 文件上运行该函数的函数?

转载 作者:行者123 更新时间:2023-12-01 07:26:24 25 4
gpt4 key购买 nike

我有一个包含 19 个 csv 文件的目录,每个文件都包含学生注册号及其姓名的列表。有两个名为 quiz1 和 quiz2 的独立文件,这两个文件都包含所有参加这些测验的学生的信息以及他们的姓名和获得的总分。每项测试中获得的分数必须分为不同的列,以及显示他们参加该特定测验的“noofpresent”列。

我的任务是解析所有这些文件并创建一个基本上如下所示的数据框。 sample dataframe with 5 batches instead of 19上图显示了总共 19 个批处理中的 5 个批处理。

虽然我已经填写了 Batch4 的相关字段(如图所示),但我意识到对 18 个文件重复该过程是疯狂的。

我如何编写一个程序或函数来执行这两个测验的所有剩余 18 个批处理的所有操作?我只需要了解如何继续处理其余 18 个文件的自动化逻辑。

第 9 批的前任(例如):

这是我需要为 19 个批处理中的每一个批处理复制的代码:

import pandas as pd

spath = 'd:\\a2\\studentlist.csv'
q1path = 'd:\\a2\\quiz\\quiz1.csv'
q2path = 'd:\\a2\\quiz\\quiz2.csv'
b1path = 'd:\\a2\\batchwiselist\\1.csv'
b9path = 'd:\\a2\\batchwiselist\\9.csv'
tpath = 'd:\\a2\\testcasestudent.txt'

# the final dataframe that needs to be created and filled up eventually
idx = pd.MultiIndex.from_product([['batch1', 'batch2', 'batch3', 'batch4', 'batch9'], ['quiz1', 'quiz2']])
cols=['noofpresent', 'lesserthan50', 'between50and60', 'between60and70', 'between70and80', 'greaterthan80']
statdf = pd.DataFrame('-', idx, cols)


# ============BATCH 9===================]

# ----------- QUIZ 1 -----------]

# Master list of students in Batch 9
b9 = pd.read_csv(b9path, usecols=['studentName', 'admissionNumber'])
b9.rename(columns={'studentName' : 'Firstname'}, inplace=True)
# To match column from quiz1.csv to batch9.csv to for merger

# Master list of all who attended Quiz1
q1 = pd.read_csv(q1path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q1.dropna(inplace=True)
q1['Grade/10.00'] = q1['Grade/10.00'] * 10
# Multiplying the grades by 10 to mark against 100 instead of 10

# Merge batch9 list of names to list of quiz1 on their firstname column
q1b9 = pd.merge(b9, q1)
q1b9 = q1.loc[q1['Firstname'].isin(b9.Firstname)] # checking if the name exits in either lists
q1b9.reset_index(inplace=True)
#print(q1b9)

lt50 = q1b9.loc[(q1b9['Grade/10.00'] < 50)]
#findout list of students whose grades are lesser than 50
out9q1 = (lt50['Grade/10.00'].count())
# print(out9q1) to just get the count of number of students who got <50 quiz1 from batch9

# Similar process for quiz2 below for batch9.
# -------------------- QUIZ 2 ------------------]

# Master list of all who attended Quiz2
q2 = pd.read_csv(q2path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q2.dropna(inplace=True)
q2['Grade/10.00'] = q2['Grade/10.00'] * 10

# Merge B1 to Q2
q2b9 = pd.merge(b9, q2)
q2b9 = q2.loc[q2['Firstname'].isin(b9.Firstname)]
q2b9.reset_index(inplace=True)


q2b9.loc[(q2b9['Grade/10.00'] <= 50)].count()
lt50 = q2b9.loc[(q2b9['Grade/10.00'] < 50)]
out9q2 = (lt50['Grade/10.00'].count())
# print(out9q2)

上述代码针对所有在任一测验中得分低于 50 分的学生进行计算。我对batch4做了类似的事情。我需要复制此操作,以便一个函数可以对所有可用的剩余 (17-18) 批处理执行此操作。

最佳答案

在下面的代码中,我生成了所有 csv 路径并一一加载,然后执行所有过程,然后将结果数据帧保存在数据帧列表中,例如 [[batch1_q1_result,batch1_q2_result],[batch2_q1_result,batch2_q2_result] ...]

def doAll(baseBatchPath, numberOfBatches):
batchResultListAll = [] # this will store the resulted dataframes
spath = 'd:\\a2\\studentlist.csv'
q1path = 'd:\\a2\\quiz\\quiz1.csv'
q2path = 'd:\\a2\\quiz\\quiz2.csv'
tpath = 'd:\\a2\\testcasestudent.txt'
# the final dataframe that needs to be created and filled up eventually
idx = pd.MultiIndex.from_product([['batch1', 'batch2', 'batch3', 'batch4', 'batch9'], ['quiz1', 'quiz2']])
cols=['noofpresent', 'lesserthan50', 'between50and60', 'between60and70', 'between70and80', 'greaterthan80']
statdf = pd.DataFrame('-', idx, cols)

# Master list of all who attended Quiz1
q1 = pd.read_csv(q1path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q1.dropna(inplace=True)
q1['Grade/10.00'] = q1['Grade/10.00'] * 10
# Master list of all who attended Quiz2
q2 = pd.read_csv(q2path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q2.dropna(inplace=True)
q2['Grade/10.00'] = q2['Grade/10.00'] * 10

# generate each batch file path and do other works
for batchId in range(numberOfBatches-1):
batchCsvPath = baseBatchPath + str(batchId+1) + ".csv"
# Master list of students in Batch 9
batch = pd.read_csv(batchCsvPath, usecols=['studentName', 'admissionNumber'])
batch.rename(columns={'studentName' : 'Firstname'}, inplace=True)
# Merge eachBatch list of names to list of quiz1 on their firstname column
q1batch = pd.merge(batch, q1)
q1batch = q1.loc[q1['Firstname'].isin(batch.Firstname)] # checking if the name exits in either lists
q1batch.reset_index(inplace=True)
#print(q1batch)

lt50 = q1batch.loc[(q1batch['Grade/10.00'] < 50)]
#findout list of students whose grades are lesser than 50
outBatchq1 = (lt50['Grade/10.00'].count())
# print(outBatchq1) to just get the count of number of students who got <50 quiz1 from batch -> batchId

#do same for quiz 2

# Merge each Batch to Q2
q2batch = pd.merge(batch, q2)
q2batch = q2.loc[q2['Firstname'].isin(batch.Firstname)]
q2batch.reset_index(inplace=True)


q2batch.loc[(q2batch['Grade/10.00'] <= 50)].count()
lt50 = q2batch.loc[(q2batch['Grade/10.00'] < 50)]
outBatchq2 = (lt50['Grade/10.00'].count())
# print(outBatchq2)
# finally save the resulted DF for later use
batchResultListAll.append([q1batch, q2batch])


#call the function using base path and number of batch csv files
doAll("d:\\\\a2\\\\batchwiselist\\\\", 18)

关于python - 如何在 Python 中编写一个在 20 个不同的 csv 文件上运行该函数的函数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57430111/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com