gpt4 book ai didi

python - Pandas:读取多字符分隔符 csv 文件?

转载 作者:太空宇宙 更新时间:2023-11-03 19:52:37 25 4
gpt4 key购买 nike

我有以下 csv 文件,我想使用 pandas.read_csv 读取该文件,但无法正常工作。

                                                                Mat  Pur Mat    Mat  Proc ABC   TimePrice            Crncy Supplier      
Plant Material Number Material Description Grp Grp Status Type Type Class daysper each Key Consignment
-----------------------------------------------------------------------------------------------------------------------------------------
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0

我尝试了以下代码,但问题是

  • Material 描述中出现的空白
  • 发现阅读标题很困难
  • 第 2、3 行等的Material DescriptionMat Grp 之间没有空格
import pandas as pd

df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")

最佳答案

我相信您正在寻找 Pandas read_fwf功能。不幸的是,您必须手动指定列的宽度。以下是前几列的示例:

s = '''
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
'''

from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])

这是输出数据帧:

   Unnamed: 0      Unnamed: 1                                Unnamed: 2  \
0 9 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15
1 9 1/JJJJJJJJJ/1R EQUIPPED MAGAZINE/SUP 6601; Equipped mag
2 9 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped mag
3 9 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur
4 9 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit
5 9 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25
6 9 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600
7 9 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160
8 9 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM
9 9 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100

Unnamed: 3
0 ZEEJJMA9
1 ZEEJJMA9
2 ZEEJJMA9
3 305 MA9
4 ZEEJJMA9
5 ZEEJJMA9
6 ZEEJJMA9
7 ZEEJJMA9
8 ZEEJJMA9
9 ZEEJJMA9

关于python - Pandas:读取多字符分隔符 csv 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59737399/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com