python / Pandas : Convert multiple CSV files to have union and ordered header and fill the missing data-6ren

python / Pandas : Convert multiple CSV files to have union and ordered header and fill the missing data

转载作者：行者123 更新时间：2023-12-04 04:00:28

24

4

我在一个包含奇数列的文件夹中有 36 个 CSV 文件(每个文件中的列数范围从 90 到 255)。在单个 CSV 文件中，最大行数为 300，但是，列可以有 0 到 300 行。 CSV 文件示例如下:

Row  col1 col2 col3 ........................ col200
 1     2      3   4   ......................... 25
 2     1          8   .......................... 0.2
 3     5          2   ........................... 5
 .     .          .   ..........................  .
 .     .          .   ........................... .
 .     .          .   ........................... .
 .     .          .   ........................... .
 .     .          .   ........................... .
 .     .          .   ........................... .
 300   3          12   ..........................  1

我想使用 Python 转换这些 CSV 文件以获取以下属性:

所有转换后的 CSV 文件必须具有相同的列(相同的顺序和大小)。这些列是所有原始 CSV 文件中的列(非重复)的并集。
转换后的 CSV 文件的列必须按顺序排列。即 c1.csv 的 3rd column 也必须是其他剩余 CSV 文件的 3rd column。
如果任何原始 CSV 文件中缺少任何联合列，则转换后的 CSV 文件将在行中添加缺少的列和默认值(所有 300 行中的固定值)。
如果联合列和原始 CSV 文件中都存在任何列:
(a) 如果这一列的行数是300，照原样复制。
(b) 如果此列中的行数少于 300，则用此行中可用值的平均值填充剩余行。

为了实现上述特性，我用python编写了如下代码:

import pandas as pd
import csv
import glob
import os

path = r'FILE PATH TO ORIGINAL FILES' # file path to original files
all_files = glob.glob(path + "/*.csv")
combined_csv = pd.concat([pd.read_csv(f) for f in all_files]) #To get common columns
master_set =list(combined_csv.columns)

for file in all_files:
    filtered_df = pd.read_csv(file)
    for cols in master_set:
        if(cols in filtered_df):
            if(filtered_df[cols].count()>300): pass
            elif (filtered_df[cols].count()<300):
                total = sum(value for value in filtered_df[cols])
                avg = total/filtered_df[cols].count()
                i = filtered_df[cols].count()
                while i<301:
                    filtered_df.at[i,cols] = avg
                    i+=1
        else:
         filtered_df[cols] = 10
         
    file_name = os.path.split(file)[-1] #Select individual file (eg. c1.csv)
    file_name_path = os.path.join('FILE PATH TO CONVERTED FILES' + file_name) 
    filtered_df.to_csv(file_name_path)

运行上面的代码后，我得到了 36 个转换后的 CSV 文件，其中包含一组通用的列。添加的列(在联合列中但不在单个文件中的列)中的行用默认值填充。但是，上面的代码仍然没有满足以下属性。

新创建的 CSV 文件中的列顺序不匹配。
未实现上述特性 4(b)。即，原始文件中出现在联合列中但值少于 300 个(行 <300)的任何列都不会被插值填充。

我将更新/编辑我的问题以进一步阐明。

任何帮助，请!

最佳答案

我用以下方法解决了我的问题:

import pandas as pd
import csv
import glob
import os

path = r'FILE PATH TO ORIGINAL FILES' # file path to original files
all_files = glob.glob(path + "/*.csv")
combined_csv = pd.concat([pd.read_csv(f) for f in all_files]) #To get common columns
master_set =list(combined_csv.columns)

for file in all_files:
    filtered_df = pd.read_csv(file)
    for cols in master_set:
        if(cols in filtered_df):
            total = 0
            if(filtered_df[cols].count()>300): pass
            elif (filtered_df[cols].count()<300):
                for value in filtered_df[cols]:
                    if(math.isnan(value) == False): 
                        total = total + value
                avg = total/filtered_df[cols].count()
                i = filtered_df[cols].count()
                while i<301:
                    filtered_df.at[i,cols] = avg
                    i+=1
        else:
         filtered_df[cols] = 10
         
    filtered_df = filtered_df[master_set]     
    file_name = os.path.split(file)[-1] #Select individual file (eg. c1.csv)
    file_name_path = os.path.join('FILE PATH TO CONVERTED FILES' + file_name) 
    filtered_df.to_csv(file_name_path)

为了实现 (1)，我用 filtered_df = filtered_df[master_set] 更新了代码。对于 4(b)，我将原始代码更新为 if(math.isnan(value) == False)。

关于 python / Pandas : Convert multiple CSV files to have union and ordered header and fill the missing data，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63112465/

24

4

0

文章推荐： css - 如何在剪贴板中复制颜色，reactjs

文章推荐： android - 打开抽屉导航时更改顶部栏的背景颜色

文章推荐： bash - 使用 GitHub Actions 循环多个文件

java - org.springframework.core.convert.ConverterNotFoundException : No converter found capable of converting from type Account to type AllAccount
我正在使用SpringBoot和JPA来调用db，我遇到异常 org.springframework.core.convert.ConverterNotFoundException: No conve
spring - org.springframework.core.convert.converter.Converter 给出错误 : Null can not be a value of a non-null type
我尝试实现 Spring Converter，但在单元测试中出现错误: Kotlin: Null can not be a value of a non-null type TodoItem 如果我尝
spring - org.springframework.core.convert.ConverterNotFoundException : No converter found capable of converting from type to type [java. lang.String] - Redis
我在 Spring Boot 2.0 示例中使用 Spring Data Redis。在此示例中，我尝试将客户数据 + 学生数据保存在一起。我不太确定这里的数据建模是如何发生的，但假设它与 Mongo
java - Spring Webflow 绑定(bind) : Converter - java. lang.IllegalArgumentException : Each converter object must implement one of the Converter . .. 接口(interface)
我在 Spring 的 XML 配置文件之一中有以下代码:
java - Hibernate Converter + 在 Converter 中检索属性名
我们正在尝试使用 hibernate Converter 来加密/解密通过 hibernate 存储的几列数据 @Convert(attributeName="myattr",converter=Da
c# - Convert.TryToInt64 而不是 Convert.ToInt64？
我有this我必须实现的功能: protected override ValidationResult IsValid( Object value, ValidationContext
rust - 我应该什么时候实现 std::convert::From vs std::convert::Into？
我看到了 std::convert::Into有任何实现 std::convert::From 的实现: impl Into for T where U: From, 在Rust 1.0标准库
c# - Convert.ChangeType 或 Convert.ToInt32 之间的主要区别是什么？
Convert.ChangeType 或 Convert.ToInt32 或 int.Parse 之间是否存在性能优势最佳答案如果您知道要将 string 转换为 Int32，使用 Convert
bash - 将通配符与 "convert"一起使用。或者 "convert"ing 一组文件
我会定期浏览我的家庭作业以供上课。我的扫描仪将原始 jpg 文件导出到 USB，然后我可以从那里使用 gimp 编辑文件并将其另存为 pdf。我发现一种节省时间的方法是将我的多页作业导出为 .mng
json - Grails在BootStrap中注册了一个DateMarshaller，引发了异常，即rails.converters.JSON无法转换为grails.converters.XML
Grails版本:2.3.8我在BootStrap.groovy中注册了一个自定义日期编码器，但是当我使用日期填充为Json的Object时，它将引发异常:Exception message is C
bash - 将通配符与 "convert"一起使用。或者 "convert"ing 一组文件
我会定期浏览我的家庭作业以供上课。我的扫描仪将原始 jpg 文件导出到 USB，然后我可以从那里使用 gimp 编辑文件并将其另存为 pdf。我发现一种节省时间的方法是将我的多页作业导出为 .mng
swift - 使用转换(_ :to:) or convert(_:from:) to convert a node's position to another's
我正在尝试制作一个 SKAction，以便我的玩家慢慢地被拉向一个要杀死他的敌人。实际上，问题在于玩家和敌人处于不同的节点，遵循以下层次结构: 场景(SKScene)-PARENT->播放器(SKNo
Spring 数据mongodb : access default POJO converter from within custom converter
我通过 xml 设置了 spring data mongo 自定义转换器，如下所示在自定义读/写转换器中，我想
ruby-on-rails - 我得到这个 "Error while running convert: sh: convert: command not found"
我正在尝试使用名为 Simple Captcha 的 gem 这需要在机器上安装 ImageMagick。我已经安装了它并且 convert --version 显示了这个 Version: Imag
ruby-on-rails - 我得到这个 "Error while running convert: sh: convert: command not found"
我正在尝试使用名为 Simple Captcha 的 gem 这需要在机器上安装 ImageMagick。我已经安装了它并且 convert --version 显示了这个 Version: Imag
java - Spring JPA native 查询调用存储过程给出 “No converter found capable of converting from type”
我正在使用 Spring JPA，我需要有一个 native 查询来调用存储过程。从结果中，我只需要获取两个字段，即代码和消息。我创建了一个包含两个字段代码和消息的类。它不起作用，这是我收到的错误:
java - org.apache.camel.NoTypeConversionAvailableException : No type converter available to convert from type:
我首先有多部分文件，我想将其发送到camel管道并使用原始名称保存该文件。我的代码: @Autowired ProducerTemplate producerTemplate; ...
java - 为什么.NoSuchMethodError : org. springframework.core.convert.converter.ConverterRegistry.addConverter
我的maven项目使用了spring、hibernate。我得到“没有这样的方法错误”。我相信这是由于依赖项中的版本冲突造成的，但不知道是什么。构建成功。但是在“NetBeans:在 GlassFis
java - Vaadin 8 Converter 的行为与 Vaadin 7 Converter 不同(不更新 UI)？
TL;DR:Vaadin 8 中是否有类似于 Vaadin 7 的转换器来更新 UI 中输入字段的表示？ IE。在输入字段失去焦点后立即从用户输入中删除所有非数字，或将小数转换为货币？ Vaadin
c# - 表达式.Convert : Object of type 'System.Int64' cannot be converted to type 'System.Int32'
我昨天问了一个问题here关于从匿名对象读取属性并将它们写入类的私有(private)字段。问题解决了。这是一个小故事: 我有一些 json 格式的数据。我将它们反序列化为 ExpandoObject

首页

博学

6Ren·AI

商城

python / Pandas : Convert multiple CSV files to have union and ordered header and fill the missing data