gpt4 book ai didi

python - Pandas 无法读取超过第 216 行的锯齿状文本文件

转载 作者:行者123 更新时间:2023-12-01 03:36:32 26 4
gpt4 key购买 nike

我有一个锯齿状的 txt 文件(每行的列数不同),并尝试在 Pandas 中读取它。由于某种原因,它可以读取前 216 行,但不能读取前 217 行。

>>> df = pd.read_table("test.txt", names = range(2000), nrows = 216)
>>> df = pd.read_table("test.txt", names = range(2000), nrows = 217)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 321, in _read
return parser.read(nrows)
File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 815, in read
ret = self._engine.read(nrows)
File "/Users/alexwhatley/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py", line 1314, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
File "pandas/parser.pyx", line 839, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9208)
File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

文件在这里:https://gist.github.com/alexanderwhatley/e07af297b1a10cd5cb57c7b75ee7f229 。有谁知道这是怎么回事吗?

最佳答案

解决方法是:

import pandas as pd

the_file = []
with open(r"./genes.txt", 'rb') as f:
for line in f:
the_file.append(line.split('\t'))

df = pd.DataFrame(the_file,columns=range(max([len(l) for l in the_file])))

print df[0]

结果:

0                       KEGG_GLYCOLYSIS_GLUCONEOGENESIS
1 KEGG_CITRATE_CYCLE_TCA_CYCLE
2 KEGG_PENTOSE_PHOSPHATE_PATHWAY
3 KEGG_PENTOSE_AND_GLUCURONATE_INTERCONVERSIONS
4 KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM
5 KEGG_GALACTOSE_METABOLISM
6 KEGG_ASCORBATE_AND_ALDARATE_METABOLISM
7 KEGG_FATTY_ACID_METABOLISM
8 KEGG_STEROID_BIOSYNTHESIS
9 KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS
10 KEGG_STEROID_HORMONE_BIOSYNTHESIS
11 KEGG_OXIDATIVE_PHOSPHORYLATION
12 KEGG_PURINE_METABOLISM
13 KEGG_PYRIMIDINE_METABOLISM
14 KEGG_ALANINE_ASPARTATE_AND_GLUTAMATE_METABOLISM
15 KEGG_GLYCINE_SERINE_AND_THREONINE_METABOLISM
16 KEGG_CYSTEINE_AND_METHIONINE_METABOLISM
17 KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_DEGRADATION
18 KEGG_VALINE_LEUCINE_AND_ISOLEUCINE_BIOSYNTHESIS
19 KEGG_LYSINE_DEGRADATION
20 KEGG_ARGININE_AND_PROLINE_METABOLISM
21 KEGG_HISTIDINE_METABOLISM
22 KEGG_TYROSINE_METABOLISM
23 KEGG_PHENYLALANINE_METABOLISM
24 KEGG_TRYPTOPHAN_METABOLISM
25 KEGG_BETA_ALANINE_METABOLISM
26 KEGG_TAURINE_AND_HYPOTAURINE_METABOLISM
27 KEGG_SELENOAMINO_ACID_METABOLISM
28 KEGG_GLUTATHIONE_METABOLISM
29 KEGG_STARCH_AND_SUCROSE_METABOLISM
...
425 ST_GAQ_PATHWAY
426 ST_GA13_PATHWAY
427 ST_STAT3_PATHWAY
428 SA_FAS_SIGNALING
429 SA_G1_AND_S_PHASES
430 SIG_INSULIN_RECEPTOR_PATHWAY_IN_CARDIAC_MYOCYTES
431 ST_T_CELL_SIGNAL_TRANSDUCTION
432 ST_TYPE_I_INTERFERON_PATHWAY
433 ST_PAC1_RECEPTOR_PATHWAY
434 SIG_PIP3_SIGNALING_IN_B_LYMPHOCYTES
435 SIG_BCR_SIGNALING_PATHWAY
436 SA_G2_AND_M_PHASES
437 ST_B_CELL_ANTIGEN_RECEPTOR
438 ST_INTERLEUKIN_4_PATHWAY
439 ST_WNT_BETA_CATENIN_PATHWAY
440 SA_MMP_CYTOKINE_CONNECTION
441 ST_JNK_MAPK_PATHWAY
442 SA_PROGRAMMED_CELL_DEATH
443 ST_FAS_SIGNALING_PATHWAY
444 ST_MYOCYTE_AD_PATHWAY
445 SA_PTEN_PATHWAY
446 SA_REG_CASCADE_OF_CYCLIN_EXPR
447 SA_TRKA_RECEPTOR
448 ST_PHOSPHOINOSITIDE_3_KINASE_PATHWAY
449 PID_FANCONI_PATHWAY
450 PID_SMAD2_3NUCLEAR_PATHWAY
451 PID_FCER1_PATHWAY
452 PID_ENDOTHELIN_PATHWAY
453 PID_BCR_5PATHWAY
454 PID_PRL_SIGNALING_EVENTS_PATHWAY

关于python - Pandas 无法读取超过第 216 行的锯齿状文本文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40290262/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com