gpt4 book ai didi

linux - 根据另一个文件中的关键字在文件目录中搜索

转载 作者:太空宇宙 更新时间:2023-11-04 04:15:02 24 4
gpt4 key购买 nike

这里是 Perl 新手,正在寻求帮助。

我有一个文件目录和一个“关键字”文件,其中包含要搜索的属性和属性类型。

例如:

关键字.txt

Attribute1 boolean
Attribute2 boolean
Attribute3 search_and_extract
Attribute4 chunk

对于目录中的每个文件,我必须:

  • 查找keywords.txt
  • 根据属性类型进行搜索

类似于下面的内容。

IF attribute_type = boolean THEN
search for attribute;
set found = Y if attribute found;
ELSIF attribute_type = search_and_extract THEN
extract string where attribute is Found
ELSIF attribute_type = chunk THEN
extract the complete chunk of paragraph where attribute is found.

这就是我到目前为止所拥有的,我确信有更有效的方法来做到这一点。

我希望有人能指导我朝着正确的方向去做上述事情。感谢和问候,司马

# Reads attributes from config file
# First set boolean attributes. IF keyword is found in text,
# variable flag is set to Y else N
# End Code: For each text file in directory loop.
# Run the below for each document.

use strict;
use warnings;

# open Doc
open(DOC_FILE,'Final_CLP.txt');
while(<DOC_FILE>) {
chomp;
# open the file
open(FILE,'attribute_config.txt');
while (<FILE>) {
chomp;
($attribute,$attribute_type) = split("\t");

$is_boolean = ($attribute_type eq "boolean") ? "N" : "Y";

# For each boolean attribute, check if the keyword exists
# in the file and return Y or N
if ($is_boolean eq "Y") {
print "Yes\n";
# search for keyword in doc and assign values
}

print "Attribute: $attribute\n";
print "Attribute_Type: $attribute_type\n";
print "is_boolean: $is_boolean\n";
print "-----------\n";
}
close(FILE);
}
close(DOC_FILE);
exit;

最佳答案

最好用一个故事来开始你的规范/问题(“我有一个......”)。但这样的故事——无论是真实的还是编造的,因为你无法透露真相——应该给

  • 对情况/问题/任务的生动描述
  • 必须完成所有工作的原因
  • 不常见(常用)术语的定义

所以我首先要说的是:我在 jail 工作,必须扫描电子邮件囚犯人数

  • 文本中任何地方提到的名字(例如“Al Capone”);导演想要完整阅读这些邮件
  • 订单行(例如“武器:AK 4711 数量:14”);军械军官想要这些信息来计算弹药数量需要机架空间
  • 包含“家庭”关键字的段落,例如“妻子”、“ child ”……;牧师想要有效地准备她的布道

就其本身而言,每个术语“关键字”(〜运行文本)和“属性”(~结构化文本)可能是“清晰的”,但如果两者都应用到“我必须寻找的X”,事情变得一团糟。而不是一般的(“ block ”)和技术(“字符串”)术语,您应该使用“真实世界”(线)和具体(段落)词。您的输入示例:

From: Robin Hood
To: Scarface

Hi Scarface,

tell Al Capone to send a car to the prison gate on sunday.

For the riot we need:

weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8

Tell my wife in Folsom to send some money to my son in
Alcatraz.

Regards
Robin

以及您的预期输出:

--- Robin.txt ----
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife:
knife: Bowie quantity: 8
machine gun:
stinger rocket:
weapon:
weapon: AK 4711 quantity: 14
social relations paragaphs:
Tell my wife in Folsom to send some money to my son in
Alcatraz.

伪代码应该从顶层开始。如果你从

开始
for each file in folder
load search list
process current file('s content) using search list

很明显

load search list
for each file in folder
process current file using search list

会好很多。

根据这个故事、示例和顶层计划,我会尝试提供“流程的简化版本”的概念验证代码使用搜索列表的当前文件(内容)”任务:

given file/text to search in and list of keywords/attributes

print file name
print "keywords:"
for each boolean item
print boolean item text
if found anywhere in whole text
print "Yes"
else
print "No"
print "order line:"
for each line item
print line item text
if found anywhere in whole text
print whole line
print "social relations paragaphs:"
for each paragraph
for each social relation item
if found
print paragraph
no need to check for other items

第一次实现尝试:

use Modern::Perl;

#use English qw(-no_match_vars);
use English;

exit step_00();

sub step_00 {
# given file/text to search in
my $whole_text = <<"EOT";
From: Robin Hood
To: Scarface

Hi Scarface,

tell Al Capone to send a car to the prison gate on sunday.

For the riot we need:

weapon: AK 4711 quantity: 14
knife: Bowie quantity: 8

Tell my wife in Folsom to send some money to my son in
Alcatraz.

Regards
Robin
EOT

# print file name
say "--- Robin.txt ---";
# print "keywords:"
say "keywords:";
# for each boolean item
for my $bi ("Al Capone", "Billy the Kid", "Scarface") {
# print boolean item text
printf " %s: ", $bi;
# if found anywhere in whole text
if ($whole_text =~ /$bi/) {
# print "Yes"
say "Yes";
# else
} else {
# print "No"
say "No";
}
}
# print "order line:"
say "order lines:";
# for each line item
for my $li ("knife", "machine gun", "stinger rocket", "weapon") {
# print line item text
# if found anywhere in whole text
if ($whole_text =~ /^$li.*$/m) {
# print whole line
say " ", $MATCH;
}
}
# print "social relations paragaphs:"
say "social relations paragaphs:";
# for each paragraph
for my $para (split /\n\n/, $whole_text) {
# for each social relation item
for my $sr ("wife", "son", "husband") {
# if found
if ($para =~ /$sr/) {
## if ($para =~ /\b$sr\b/) {
# print paragraph
say $para;
# no need to check for other items
last;
}
}
}
return 0;
}

输出:

perl 16953439.pl
--- Robin.txt ---
keywords:
Al Capone: Yes
Billy the Kid: No
Scarface: Yes
order lines:
knife: Bowie quantity: 8
weapon: AK 4711 quantity: 14
social relations paragaphs:
tell Al Capone to send a car to the prison gate on sunday.
Tell my wife in Folsom to send some money to my son in
Alcatraz.

这样的(不成熟的)代码可以帮助你

  • 澄清您的规范(未找到的关键字是否应该进入输出?
  • 您的搜索列表真的是扁平化的还是应该结构化/分组?)
  • 检查您对如何做事的假设(如果订单行搜索是在整个文本的行数组上完成的吗?)
  • 确定进一步研究/rtfm 的主题(例如正则表达式( jail !))
  • 计划您的后续步骤(文件夹循环、读取输入文件)

(另外,知情人士会指出我所有的不良做法,这样你就可以从一开始就避免它们)

祝你好运!

关于linux - 根据另一个文件中的关键字在文件目录中搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16953439/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com