gpt4 book ai didi

unix - Using sed to remove words in a stopword list(Feeding sed a list of parameters to remove from a text file)

转载 作者:行者123 更新时间:2023-12-01 14:29:58 25 4
gpt4 key购买 nike

所以,我们都知道 sed 非常擅长查找和替换文件中出现的所有单词:

sed -i 's/original_word/new_word/g' file.txt

但是,有人可以告诉我如何从文件中为 sed 提供“original_words”列表(类似于 grep -f)吗?我只想用 '' 替换所有(删除它们)。

原始词表文件只是一堆由行分隔的停用词(wordlist.txt):

a
about
above
according
across
after
afterwards

这将是获取停用词列表并从语料库中删除它们的简单方法(对清理数据很有用)。

file.txt 看起来像

05ricardo   RT @shakira: Immigration reform isn't about politics. It's about people mothers, kids. Obama is working for all of them. http://t.co/rAW ...    0
05ricardo ?@ItsReginaG: Don't vote Obama. Because you will lose jobs, and die.? Lol 0
05ricardo ?@shakira: Obama doubles Pell Grants - 700,000 more Latinos get help to go to college. Meet Johanny Adames http://t.co/EMg8NLGl Shak?. ? -1
05rodriguez_a My Comm teacher gave me a copy of Obama's speech that he gave the other night and I cried while reading it. It was that moving. -3

最佳答案

您也可以让 sed 为您编写 sed 脚本(使用 GNU sed 测试):

<stopwords sed 's:.*:s/\\b&\\b//:g' | sed -f - file.txt

输出:

05ricardo   RT @shakira: Immigration reform isn't  politics. It's about people mothers, kids. Obama is working for all of them. http://t.co/rAW ...    0
05ricardo ?@ItsReginaG: Don't vote Obama. Because you will lose jobs, and die.? Lol 0
05ricardo ?@shakira: Obama doubles Pell Grants - 700,000 more Latinos get help to go to college. Meet Johanny Adames http://t.co/EMg8NLGl Shak?. ? -1
05rodriguez_a My Comm teacher gave me copy of Obama's speech that he gave the other night and I cried while reading it. It was that moving. -3

关于unix - Using sed to remove words in a stopword list(Feeding sed a list of parameters to remove from a text file),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14744293/

25 4 0