gpt4 book ai didi

python - Pandas导入CSV和Excel文件报错

转载 作者:太空狗 更新时间:2023-10-29 21:08:46 24 4
gpt4 key购买 nike

我正在尝试使用 Python Pandas 导入 CSV 文件。此文件中的示例数据如下,其中第一行是用逗号分隔的列名。

End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date 
11027676,Baroda Western Uttar Pradesh Gramin Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,"Hcl Infosystems Ltd - Partnerdghftrutyhb frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw",Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,"Open Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho",125.85,1,FY07,12/28/2006
12835756,Uttam Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,9/15/2008
12233135,Bhagwan Singh Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA - New,0,0,FY09,9/15/2008
11602305,Maya Academy Of Advanced Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet L&SA Recur-Def,0,0,FY09,9/25/2008
13336009,Remiel Softech Solution Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,12/23/2008

我正在使用以下代码导入:

import pandas as pd

df=pd.read_csv('file path.csv',sep=',')

它给出了以下错误:

Traceback (most recent call last):
File "<pyshell#25>", line 1, in <module>
df=pd.read_csv(filename,sep=',')
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
return parser.read()
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
ret = self._engine.read(nrows)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
data = self._reader.read(nrows)
File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 90: invalid start byte

因为它看起来像是一个 Unicode 错误,所以我这次运行时更改了编码:

df=pd.read_csv(filename,encoding='utf-16',sep=',')

它给出了以下错误:

Traceback (most recent call last):
File "<pyshell#26>", line 1, in <module>
df=pd.read_csv(filename,encoding='utf-16',sep=',')
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 198, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
self._make_engine(self.engine)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 586, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 957, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 477, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4434)
File "parser.pyx", line 592, in pandas.parser.TextReader._get_header (pandas\parser.c:5660)
File "parser.pyx", line 768, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:7451)
File "parser.pyx", line 1661, in pandas.parser.raise_parser_error (pandas\parser.c:18744)
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

不确定为什么会这样?甚至尝试使用 Text to Columns 将 CSV 文件转换为 Excel,并使用 Pandas 的 read_excel 函数。这也给出了错误(如下):

Traceback (most recent call last):
File "<pyshell#30>", line 1, in <module>
df=pd.read_excel('J:\dmqp on 192.168.1.41\MS Sales Dump (FY09)xls','MS Sales Dump (FY09)')
File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 52, in read_excel
return ExcelFile(path_or_buf,kind=kind).parse(sheetname=sheetname,
File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 68, in __init__
import xlrd # throw an ImportError if we need to
ImportError: No module named 'xlrd'

有人可以帮助解决上述错误以及在导入 CSV 和 Excel 时这里出了什么问题。

我试过这段编码改变的代码:

df=pd.read_csv(filename,encoding='iso-8859-1',sep=',')

它没有给出任何错误,但作为一列导入而不是将其分解为单独的列。

>>>df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263244 entries, 0 to 263243
Data columns (total 1 columns):
End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date 263244 non-null values
dtypes: object(1)

通过将上面的示例数据存储在文本文件中并导入来检查上面的示例数据,这是我得到的输出:

>>> df =pd.read_csv(r'J:\Data.txt')
>>> print(df)
End Customer Organization ID \
0 11027676
1 12835756
2 12233135
3 11602305
4 13336009

End Customer Organization Name \
0 Baroda Western Uttar Pradesh Gramin Bankgfhgfn...
1 Uttam Strips Pvt Ltd
2 Bhagwan Singh Tondon
3 Maya Academy Of Advanced Cinematics
4 Remiel Softech Solution Pvt Ltd

End Customer Top Parent Organization ID \
0 4078446
1 12835756
2 12233135
3 9750934
4 13336009

End Customer Top Parent Organization Name Reseller Top Parent ID \
0 Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluil... 1809012
1 Uttam Strips Pvt Ltd 12565538
2 Bhagwan Singh Tondon 2652941
3 Maya Entertainment Ltd 336146
4 Remiel Softech Solution Pvt Ltd 13335482

Reseller Top Parent Name \
0 Hcl Infosystems Ltd - Partnerdghftrutyhb frhyw...
1 Redington C/O Fortis Financial Services Ltd
2 H B S Systems Pvt Ltd
3 Embee Software Pvt Ltd
4 Redington C/O Remiel Softech Solutions Pvt Ltd

Business \
0 Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmgh...
1 MBS
2 Server & CAL
3 Server & CAL
4 MBS

Rev Sum Division \
0 SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56ju...
1 Dynamics ERP
2 SQL Server & CAL
3 Windows Server & CAL
4 Dynamics ERP

Rev Sum Category \
0 SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfa...
1 Dynamics NAV
2 SQL CAL
3 Windows Server HPC
4 Dynamics NAV

Product Family Version \
0 SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmj... 2005
1 Dynamics NAV Business Essentials Non-specific
2 SQL CAL Non-specific
3 Windows Compute Cluster Server Non-specific
4 Dynamics NAV Business Essentials Non-specific

Pricing Level \
0 Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasd...
1 Other
2 Open
3 Open
4 Other

Summary Pricing Level \
0 Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgf...
1 MBS SA
2 Open L&SA
3 Open V/MYO - Rec
4 MBS SA

Detail Pricing Level MS Sales Amount \
0 Open Stddfm,vdnoghioerivnsdflierohgushdfovhsio... 125.85
1 MBS New Customer Enhanc. Def 0.00
2 Deferred Open L&SA - New 0.00
3 OLV Perpet L&SA Recur-Def 0.00
4 MBS New Customer Enhanc. Def 0.00

MS Sales Licenses Fiscal Year Sales Date
0 1 FY07 12/28/2006
1 0 FY09 9/15/2008
2 0 FY09 9/15/2008
3 0 FY09 9/25/2008
4 0 FY09 12/23/2008
>>>

这是在每一列之后添加'\',并且列名不是一个接着一个。相反,它们似乎在导入每一列后都在新行上。

最佳答案

我想您的主要问题与编码有关。我曾遭受过处理 csv 文件中奇怪编码的痛苦。在这些情况下对我有帮助的是尝试检测文件的真实编码并使用 pandas 正确加载它。

试试下一段代码:

from chardet.universaldetector import UniversalDetector

def test_encoding(file_name):
detector = UniversalDetector()
with open(file_name, 'rb') as f:
for line in f:
detector.feed(line)
if detector.done:
break
detector.close()
r = detector.result
return "Detected encoding %s with confidence %s" % (r['encoding'], r['confidence'])

# pass the file path in the function to see result
test_encoding('C:\Users\..\file.csv')

输出:

'Detected encoding UTF-16 with confidence 1.0'

这将尝试推断文件的编码,然后您可以尝试使用 pandas 正确加载它。希望对您有所帮助...

关于python - Pandas导入CSV和Excel文件报错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19293316/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com