gpt4 book ai didi

python - 使用 python 计算平面文件中字符串的出现次数

转载 作者:行者123 更新时间:2023-12-01 04:26:42 25 4
gpt4 key购买 nike

我正在尝试完成一门在线类(class),有一个问题是计算一个大文件中“fantastic”一词出现的次数。当发现某个事件时,需要存储该行的第一个元素(id)以构建包含该单词的行(id)列表。到目前为止,我有下面的内容可以正确读取行,但我不知道如何检查“fantastic”是否在该行的大写/小写中。我尝试过使用 row.count('fantastic')这不起作用,因为我不确定 csv 阅读器如何存储行,如果我可以对它们进行计数,我可以将 id 添加到数组中,并在每行发现一次或多次出现时在末尾打印它。

#!/usr/bin/python
import sys
import csv

def main():
f = open("test_file.txt", 'rt')
filereader = csv.reader(f, delimiter=' ', quotechar='"')
for row in filereader:
print row[0]
print row.count('fantastic')

if __name__ == "__main__":
main()

下面是一个非常小的示例集,我在其中添加了一些精彩的内容。

"6361"  "When will unit 2 be online? fantastic"   "cs101 unit2"   "100003292"     "<p>When will unit 2 be online?</p>"    "question"      "\N"    "\N"    "2012-02-26 15:47:12.522262+00" "0"     "(closed)"      "51919" "100003292"     "2012-03-03 10:12:27.41521+00"  "21196" "\N"    "\N"    "186"   "t"
"7185" "Hungarian group" "cs101 hungarian nationalities" "100003268" "<p>Hi there! This is FANTASTIC</p>
<p>Any Hungarians doing the course? We could form a group!<br>
;)</p>" "question" "\N" "\N" "2012-02-27 15:09:11.184434+00" "0" "" "\N" "100003268" "2012-02-27 15:09:11.184434+00" "9322" "\N" "\N" "106" "f"
"26454" "Course Application." "cs101 application." "100003192" "<p>Please tell about the Course Application. How to use the Course for higher education and jobs?</p>" "question" "\N" "\N" "2012-03-08 08:34:06.704674+00" "-1" "" "\N" "100003192" "2012-03-08 08:34:06.704674+00" "34477" "\N" "\N" "73" "f"

我期望输出为 6361, 7185

最佳答案

默认的引号字符已经是 " 所以你不需要指定它,但是如果你有一个制表符分隔的文件,则传入 '\t' > 作为分隔符将正确解释列。

您可以做的是构建一个生成器,根据子字符串 'fantastic' 是否出现在 ID 之后的任何列中来过滤行,然后使用列表理解来提取 ID,例如:

with open('test_file.txt') as fin:
csvin = csv.reader(fin, delimiter='\t')
has_fantastic = (row for row in csvin if any('fantastic' in col.lower() for col in row[1:]))
ids = [row[0] for row in has_fantastic]

关于python - 使用 python 计算平面文件中字符串的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33013592/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com