gpt4 book ai didi

python:删除列表中包含单词的行

转载 作者:太空宇宙 更新时间:2023-11-03 13:46:11 25 4
gpt4 key购买 nike

我正在用 python 编写一个脚本,但我似乎无法正确处理。它使用两个输入:

  1. 数据文件
  2. 停止文件

数据文件由 4 个制表符分隔的列组成,这些列已排序。停止文件由同样排序的单词列表组成。

脚本的目标是:

  • 如果数据文件第 1 列中的字符串与“停止文件”中的字符串匹配,则整行将被删除。

这是数据文件的示例:

abandonment-n   after+n-the+n-a-j   stop-n  1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1

这里是停止文件的例子:

apple-n
banana-n
cake-n
pigeon-n

这是我目前的代码:

with open("input1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
#print lemma

with open ("input2", "rb") as oSenseFile:
with open("output", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
#print concept
if concept != lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass

所需的输出如下:

abandonment-n   after+n-the+n-a-j-stop-n    1
abandonment-n against+n-the+ns-leave-n 1
abandonment-n as+n-a+vd-require-v 1
abandonment-n as+n-a-j+vg-up-use-v 1

有什么见解吗?

截至目前,我得到的输出如下,这基本上只是我一直在做的事情的打印:

abandonment-n   after+n-the+n-a-j   stop-n  1
abandonment-n against+n-the+ns leave-n 1
cake-n against+n-the+vg rest-v 1
abandonment-n as+n-a+vd require-v 1
abandonment-n as+n-a-j+vg-up use-v 1

*** 我已经尝试过但仍然无效的一些方法是:

代替 if concept != lemma:我首先尝试了 if concept not in lemma:

产生与前面提到的相同的输出。

我也怀疑函数没有调用第一个输入文件,但即使将它合并到代码中也是如此:

with open ("input2", "rb") as oSenseFile:
with open("tinput1", "rb") as oIndexFile:
for line in oIndexFile:
lemma = line.split()
with open("out", "wb") as oOutFile:
for line in oSenseFile:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if concept not in lemma:
outstring = '\t'.join(nounsInterest)
oOutFile.write(outstring + '\n')
else:
pass

产生一个空白的输出文件。

我也尝试了一种不同的方法,如下所示:

filename = "input1.txt" 
filename2 = "input2.txt"
filename3 = "output1"

def fixup(filename):
fin1 = open(filename)
fin2 = open(filename2, "r")
fout = open(filename3, "w")
for word in filename:
words = word.split()
for line in filename2:
concept, slot, filler, freq = line.split()
nounsInterest = [concept, slot, filler, freq]
if True in [concept in line for word in toRemove]:
pass
else:
outstring = '\t'.join(nounsInterest)
fout.write(outstring + '\n')
fin1.close()
fin2.close()
fout.close()

改编自 here ,没有成功。在这种情况下,根本不会产生输出。

有人可以指出我在解决此任务时出错的方向吗?虽然示例文件很小,但我必须在大文件上运行它。感谢您的帮助。

最佳答案

我认为你正在尝试做这样的事情

with open('input1', 'rb') as indexfile:
lemma = {x.strip() for x in indexfile}

with open('input2', 'rb') as sensefile, open('output', 'wb') as outfile:
for line in sensefile:
nouns_interest = concept, slot, filler, freq = line.split()
if concept not in lemma:
outfile.write('\t'.join(nouns_interest) + '\n')

您想要的输出似乎是在 slotfiller 之间放置一个连字符,因此您可能想要使用

            outfile.write('{}\t{}-{}\t{}\n'.format(*nouns_interest))

关于python:删除列表中包含单词的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19950945/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com