gpt4 book ai didi

python - 使用 Python 循环处理多个 csv 文件并从特定列的非空单元格中提取行

转载 作者:太空宇宙 更新时间:2023-11-03 20:16:14 25 4
gpt4 key购买 nike

我编写了一个代码来处理许多 csv 文件。对于其中的每一个,我想提取与名为“20201-2.0”的列的非空单元格相对应的所有行。看一下附加的示例(这是 LCE 列):

https://uoe-my.sharepoint.com/personal/gpapanas_ed_ac_uk/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fgpapanas%5Fed%5Fac%5Fuk%2FDocuments%2FCSV%20File%20screenshot%2EPNG&parent=%2Fpersonal%2Fgpapanas%5Fed%5Fac%5Fuk%2FDocuments&originalPath=aHR0cHM6Ly91b2UtbXkuc2hhcmVwb2ludC5jb20vOmk6L2cvcGVyc29uYWwvZ3BhcGFuYXNfZWRfYWNfdWsvRWF5QmJsRlRIbVZKdlJmc0I2aDhWcjRCMDlJZmpRMkwxSTVPUUtVTjJwNXd6dz9ydGltZT10V2Y0c2Q1UzEwZw

我编写了以下代码来执行此操作:

import pandas as pd
import glob
import os

path = './'
#column = ['20201-2.0']

all_files = glob.glob(path + "/*.csv")

for filename in all_files:

# Option 1 below worked, although without isolating the non-nulled values
# 1. df = pd.read_csv(filename, encoding="ISO-8859-1")
df = pd.read_csv(filename, header = 0)
df = df[df['20201-2.0'].notnull()]

print('extracting info from cvs...')
print(df)

# You can now export all outcomes in new csv files
file_name = filename + 'new' + '.csv'
save_path = os.path.abspath(
os.path.join(
path, file_name
)
)
print('saving ...')
export_csv = df.to_csv(save_path, index=None)

del df
del export_csv

但是,虽然我设法生成第一个文件,但出现以下错误:

Traceback (most recent call last):
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '20201-2.0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/PycharmProjects/OPTIMAT/Read_MR_from_all_csv.py", line 21, in <module>
df = df[df['20201-2.0'].notnull()]
File "/home/giorgos/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '20201-2.0'

我不明白为什么会发生这种情况。任何想法将不胜感激。

最佳答案

很高兴地说我找到了一种方法来做到这一点:

import pandas as pd
import glob
import os
import numpy as np

path = './'
#column = ['20201-2.0']

# all_files = glob.glob(path + "/*.csv")

#li = []
all_files = os.listdir(path)
all_df = pd.DataFrame()
for filename in all_files:
if not filename.endswith('csv'):
continue

print('extracting info from ' + filename)
# Option 1 below worked, although without isolating the non-nulled values
# 1. df = pd.read_csv(filename, encoding="ISO-8859-1")
df = pd.read_csv(filename, header=0)
#df = df[df['20201-2.0'].notnull()]

df_subset = df.dropna(subset=['20201-2.0'])
print('processed ' + filename)

# You can now export all outcomes in new csv files
file_name = filename.split('.')[0] + '_new' + '.csv'

print('saving to' + file_name)
export_csv = df_subset.to_csv('./' + file_name, index=None)

del df
del export_csv

关于python - 使用 Python 循环处理多个 csv 文件并从特定列的非空单元格中提取行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58428610/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com