gpt4 book ai didi

python - Pandas read_table 错误

转载 作者:可可西里 更新时间:2023-11-01 07:05:05 24 4
gpt4 key购买 nike

我正在尝试将制表符分隔的文本文件读取到数据框中。

这是文件在 Excel 中的样子:

CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER  TRANSACTION_TYPE    CUSTOMER_NUMBER   CUSTOMER_NAME
5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1

导入到 df 中:

df = p.read_table("E:/FileLoc/ThisIsAFile.txt", encoding = "iso-8859-1")

现在它不会将前 3 列视为列索引的一部分(df[0] = 交易类型),并且所有标题都会转移以反射(reflect)这一点。

                                CALENDAR_DATE   ORDER_NUMBER    INVOICE_NUMBER
5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1

我正在尝试操作文本文件,然后将其导入 mysql 数据库作为最终结果。

最佳答案

您可以使用 read_csv带有分隔符 2 和更多空格:

import pandas as pd
import io

temp=u"""CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER TRANSACTION_TYPE CUSTOMER_NUMBER CUSTOMER_NAME
5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1"""
#after testing replace io.StringIO(temp) to filename
df =pd.read_csv(io.StringIO(temp), sep=r'\s{2,}', engine='python', encoding = "iso-8859-1")
print (df)
CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER TRANSACTION_TYPE \
0 5/13/2016 0:00 13867666 6892372 S

CUSTOMER_NUMBER CUSTOMER_NAME
0 2026 CUSTOMER 1

如果分隔符是tabulator,使用sep='\t'

编辑:

我用你的数据测试它,它有效:

import pandas as pd

df = pd.read_csv('test/AnonymizedData.txt', sep='\t')
print (df)

CUSTOMER_NUMBER CUSTOMER_NAME CUSTOMER_BRANCH_CODE CUSTOMER_BRANCH_NAME \
0 2026 CUSTOMER 1 83 SALES BRANCH 1
1 2359 CUSTOMER 2 76 SALES BRANCH 2
2 100662 CUSTOMER 3 28 SALES BRANCH 3
3 3245 CUSTOMER 4 84 SALES BRANCH 4
4 3179 CUSTOMER 5 28 SALES BRANCH 5
5 39881 CUSTOMER 6 67 SALES BRANCH 6
6 37020 CUSTOMER 7 58 SALES BRANCH 7
7 1239 CUSTOMER 8 50 SALES BRANCH 8
8 2379 CUSTOMER 9 76 SALES BRANCH 9

CUSTOMER_CITY CUSTOMER_STATE ... PRICING_PRODUCT_TYPE_CODE \
0 TOWN 1 CO ... 11
1 TOWN 2 OH ... 11
2 TOWN 3 ME ... 11
3 TOWN 4 IL ... 11
4 TOWN 5 NH ... 11
5 TOWN 6 TX ... 11
6 TOWN 7 NC ... 11
7 TOWN 8 NY ... 11
8 TOWN 9 OH ... 11

PRICING_PRODUCT_TYPE ORGANIZATION_ID ORGANIZATION_NAME PRODUCT_LINE_CODE \
0 DISPOSABLES 83 ORGANIZATIONNAME 891
1 DISPOSABLES 83 ORGANIZATIONNAME 891
2 DISPOSABLES 83 ORGANIZATIONNAME 891
3 DISPOSABLES 83 ORGANIZATIONNAME 891
4 DISPOSABLES 83 ORGANIZATIONNAME 891
5 DISPOSABLES 83 ORGANIZATIONNAME 891
6 DISPOSABLES 83 ORGANIZATIONNAME 891
7 DISPOSABLES 83 ORGANIZATIONNAME 891
8 DISPOSABLES 83 ORGANIZATIONNAME 891

PRODUCT_LINE ROBOTIC_FLAG Unnamed: 52 Unnamed: 53 Unnamed: 54
0 PRODUCTNAME N N NaN 3
1 PRODUCTNAME N N NaN 3
2 PRODUCTNAME N N NaN 2
3 PRODUCTNAME N N NaN 7
4 PRODUCTNAME N N NaN 1
5 PRODUCTNAME N N NaN 4
6 PRODUCTNAME N N NaN 3
7 PRODUCTNAME N N NaN 5
8 PRODUCTNAME N N NaN 3

[9 rows x 55 columns]

关于python - Pandas read_table 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37445855/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com