gpt4 book ai didi

python - Pandas 相当于 Python 的 readlines 函数

转载 作者:行者123 更新时间:2023-11-28 20:18:43 25 4
gpt4 key购买 nike

使用 python 的 readlines() 函数,我可以检索文件中每一行的列表:

with open('dat.csv', 'r') as dat:
lines = dat.readlines()

我正在处理一个涉及非常大文件的问题,并且此方法会产生内存错误。是否有等同于 Python 的 readlines() 函数的 pandas? pd.read_csv() 选项 chunksize 似乎将数字附加到我的行中,这远非理想。

最小的例子:

In [1]: lines = []

In [2]: for df in pd.read_csv('s.csv', chunksize = 100):
...: lines.append(df)
In [3]: lines
Out[3]:
[ hello here is a line
0 here is another line
1 here is my last line]

In [4]: with open('s.csv', 'r') as dat:
...: lines = dat.readlines()
...:

In [5]: lines
Out[5]: ['hello here is a line\n', 'here is another line\n', 'here is my last line\n']

In [6]: cat s.csv
hello here is a line
here is another line
here is my last line

最佳答案

您应该尝试使用 pd.read_csv()chunksize 选项,如某些评论中所述。

这将强制 pd.read_csv() 一次读取定义数量的行,而不是尝试一次读取整个文件。它看起来像这样:

>> df = pd.read_csv(filepath, chunksize=1, header=None, encoding='utf-8')

在上面的示例中,文件将逐行读取。

现在,其实根据pandas.read_csv的文档,这里返回的不是 pandas.DataFrame 对象,而是 TextFileReader 对象。

  • chunksize : int, default None

Return TextFileReader object for iteration. See IO Tools docs for more information on iterator and chunksize.

因此,为了完成练习,您需要像这样将其放入一个循环中:

In [385]: cat data_sample.tsv
This is a new line
This is another line of text
And this is the last line of text in this file

In [386]: lines = []

In [387]: for line in pd.read_csv('./data_sample.tsv', encoding='utf-8', header=None, chunksize=1):
lines.append(line.iloc[0,0])
.....:

In [388]: print(lines)
['This is a new line', 'This is another line of text', 'And this is the last line of text in this file']

希望对您有所帮助!

关于python - Pandas 相当于 Python 的 readlines 函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36020690/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com