perl - 如何解析文件、创建记录并对记录执行操作，包括术语频率和距离计算-6ren

perl - 如何解析文件、创建记录并对记录执行操作，包括术语频率和距离计算

转载作者：行者123 更新时间：2023-12-03 06:20:35

26

4

我是 Perl 入门类(class)的一名学生，正在寻找有关我编写一个分析原子数据的小型(但棘手)程序的方法的建议和反馈。我的教授鼓励论坛。我对 Perl 子程序或模块(包括 Bioperl)并不熟悉，因此请将回复限制在适当的“初学者水平”，以便我可以理解并从您的建议和/或代码中学习(也请限制“Magic”)。

该计划的要求如下:

Read a file (containing data about Atoms) from the command line & create an array of atom records (one record/atom per newline). For each record the program will need to store:

• The atom's serial number (cols 7 - 11)
• The three-letter name of the amino acid to which it belongs (cols 18 - 20)
• The atom's three coordinates (x,y,z) (cols 31 - 54 )
• The atom's one- or two-letter element name (e.g. C, O, N, Na) (cols 77-78 )

提示输入三个命令之一:频率、长度、密度 d(d 是某个数字):

• freq - how many of each type of atom is in the file (example Nitrogen, Sodium, etc would be displayed like this: N: 918 S: 23
• length - The distances among coordinates
• density d (where d is a number) - program will prompt for the name of a file to save computations to and will containing the distance between that atom and every other atom. If that distance is less than or equal to the number d, it increments the count of the number of atoms that are within that distance, unless that count is zero into the file. The output will look something like:
1: 5
2: 3
3: 6
... (very big file) and will close when it finishes.

我正在寻找有关我在下面的代码中编写(和需要编写)的内容的反馈。我特别感谢任何有关如何编写我的字幕的反馈。我在底部提供了示例输入数据。

我看到的程序结构和功能描述:
$^W = 1; # turn on warnings
use strict; # behave!

my @fields;
my @recs;

while ( <DATA> ) {
 chomp;
 @fields = split(/\s+/);
 push @recs, makeRecord(@fields);
}

for (my $i = 0; $i < @recs; $i++) {
 printRec( $recs[$i] );
}
    my %command_table = (
 freq => \&freq,
 length => \&length,
 density => \&density,
 help => \&help, 
 quit => \&quit
 );

print "Enter a command: ";
while ( <STDIN> ) {
 chomp; 
 my @line = split( /\s+/);
 my $command = shift @line;
 if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
    print "Command must be: freq, length, density or quit\n";
    }
  else {
    $command_table{$command}->();
    }
 print "Enter a command: ";
 }

sub makeRecord 
    # Read the entire line and make records from the lines that contain the 
    # word ATOM or HETATM in the first column. Not sure how to do this:
{
 my %record = 
 (
 serialnumber => shift,
 aminoacid => shift,
 coordinates => shift,
 element  => [ @_ ]
 );
 return\%record;
}

sub freq
    # take an array of atom records, return a hash whose keys are 
    # distinct atom names and whose values are the frequences of
    # these atoms in the array.  

sub length
    # take an array of atom records and return the max distance 
    # between all pairs of atoms in that array. My instructor
    # advised this would be constructed as a for loop inside a for loop. 

sub density
    # take an array of atom records and a number d and will return a
    # hash whose keys are atom serial numbers and whose values are 
    # the number of atoms within that distance from the atom with that
    # serial number. 

sub help
{
    print "To use this program, type either\n",
          "freq\n",
          "length\n",
          "density followed by a number, d,\n",
          "help\n",
          "quit\n";
}

sub quit
{
 exit 0;
}

# truncating for testing purposes. Actual data is aprox. 100 columns 
# and starts with ATOM or HETATM.
__DATA__
ATOM   4743  CG  GLN A 704      19.896  32.017  54.717  1.00 66.44           C  
ATOM   4744  CD  GLN A 704      19.589  30.757  55.525  1.00 73.28           C  
ATOM   4745  OE1 GLN A 704      18.801  29.892  55.098  1.00 75.91           O 

最佳答案

看起来您的 Perl 技能正在进步——使用引用和复杂的数据结构。以下是一些提示和一般建议。

使用use warnings启用警告，而不是$^W = 1。前者是自记录的，其优点是位于封闭 block 的本地而不是全局设置。
使用命名良好的变量，这将有助于记录程序的行为，而不是依赖 Perl 的特殊 $_。例如:
```
while (my $input_record = <DATA>){
}
```
在用户输入场景中，无限循环提供了一种避免重复指令(例如“输入命令”)的方法。见下文。
您的正则表达式可以简化以避免重复 anchor 的需要。见下文。
一般来说，肯定测试比否定测试更容易理解。请参阅下面修改后的 if-else 结构。
将程序的每个部分包含在其自己的子例程中。出于多种原因，这是一个很好的一般做法，所以我就开始养成这个习惯。
一个相关的良好实践是尽量减少全局变量的使用。作为练习，您可以尝试编写程序，使其完全不使用全局变量。相反，任何需要的信息都将在子例程之间传递。对于小程序，不一定需要严格避免全局变量，但牢记理想并不是一个坏主意。
为您的length 子例程指定一个不同的名称。该名称已被内置 length 函数使用。
关于您关于makeRecord的问题，一种方法是忽略makeRecord内部的过滤问题。相反，makeRecord 可以包含一个额外的哈希字段，并且过滤逻辑将驻留在其他地方。例如:
```
my $record = makeRecord(@fields);
push @recs, $record if $record->{type} =~ /^(ATOM|HETATM)$/;
```

上述一些要点的说明:

use strict;
use warnings;

run();

sub run {
    my $atom_data = load_atom_data();
    print_records($atom_data);
    interact_with_user($atom_data);
}

...

sub interact_with_user {
    my $atom_data = shift;
    my %command_table = (...);

    while (1){
        print "Enter a command: ";
        chomp(my $reply = <STDIN>);

        my ($command, @line) = split /\s+/, $reply;

        if ( $command =~ /^(freq|density|length|help|quit)$/ ) {
            # Run the command.
        }
        else {
            # Print usage message for user.
        }
    }
}

...

关于perl - 如何解析文件、创建记录并对记录执行操作，包括术语频率和距离计算，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4351080/

26

4

0

文章推荐： jdbc - com.mysql.jdbc.Driver 的类未找到异常不是类路径问题

文章推荐： python - 使用Python获取字符的unicode代码点

文章推荐： sql - 如何找到表列数据中最长的字符串

文章推荐： asp.net-mvc - 如何保护对返回 JSON 的 MVC 操作的访问

java - Struts2 操作 > JSP > 操作
我正在努力做到这一点在我的操作中从数据库获取对象列表(确定) 在 JSP 上打印(确定) 此列表作为 JSP 中的可编辑表出现。我想修改然后将其提交回同一操作以将其保存在我的数据库中(失败。当我使用
linq - 不支持嵌套查询。操作 1 ='UnionAll' 操作 2 ='MultiStreamNest'
我有以下形式的 Linq to Entities 查询: var x = from a in SomeData where ... some conditions ... select
c# - 不支持嵌套查询。操作 1 ='UnionAll' 操作 2 ='MultiStreamNest'
我有以下查询。 var query = Repository.Query() .Where(p => !p.IsDeleted && p.Article.ArticleSections.Cou
java - Jtable ListSelectionListener 不响应 jtable 操作，而是响应同一个类中的另一个 jtable 操作
我正在编写一个应用程序包，其中包含一个主类，其中主方法与GUI类分开，GUI类包含一个带有jtabbedpane的jframe，它有两个选项卡，第一个选项卡包含一个jtable，称为jtable1，第
c# - LINQ 嵌套数组和三元运算符。不支持嵌套查询。操作 1 ='Case' 操作 2 ='Collect'
以下代码产生错误 The nested query is not supported. Operation1='Case' Operation2='Collect' 问题是我做错了什么？我该如何解决？
Redis哨兵中的C#操作
我已经为 HA redis 集群(2 个副本、1 个主节点、3 个哨兵)设置了本地 docker 环境。只有哨兵暴露端口(10021、10022、10023)。我使用的是 stackexchange
液体模板过滤器中的日期数学/操作
我正在 Desk.com 中构建一个“集成 URL”，它使用 Shopify Liquid 模板过滤器语法。对于开始日期为 7 天前而结束日期为现在的查询，此 URL 需要包含“开始日期”和“结束日期
Python为什么不支持 i++/i--操作
你一定想过。然而情况却不理想，python中只能使用类似于 i++/i--等操作。 python中的自增操作下面代码几乎是所有程序员在python中进行自增(减)操作的常用
GitHub 操作 - 将分支名称显示为构建名称
我需要在每个使用 github 操作的手动构建中显示分支。例如:https://gyazo.com/2131bf83b0df1e2157480e5be842d4fb 我应该显示分支而不是一个。最佳答
Perl qr//操作
我有一个关于 Perl qr 运算符的问题: #!/usr/bin/perl -w &mysplit("a:b:c", /:/); sub mysplit { my($str, $patt
uml - ArgoUML 操作
我已经使用 ArgoUML 创建了一个 ERD(实体关系图)，我希望在一个类中创建两个操作，它们都具有 void 返回类型。但是，我只能创建一个返回 void 类型的操作。例如: 我能够将 book
关于拉取请求和主分支的 Github 操作
Github 操作仍处于测试阶段并且很新，但我希望有人可以提供帮助。我认为可以在主分支和拉取请求上运行 github 操作，如下所示: on: pull_request push: b
用于记录的 Twilio 操作
我正在尝试创建一个 Twilio 工作流来调用电话并记录用户所说的内容。为此，我正在使用 Record，但我不确定要在 action 参数中放置什么。尽管我知道 Twilio 会发送有关调用该 UR
OpenGL 模板缓冲区 OR 操作？
我不确定这是否可行，但值得一试。我正在使用模板缓冲区来减少使用此算法的延迟渲染器中光体积的过度绘制(当相机位于体积之外时): 使用廉价的着色器，将深度测试设置为 LEQUAL 绘制背面，将它们标记在模
用于复制和重命名文件的 GitHub 操作
有没有聪明的方法来复制和重命名文件通过 GitHub 操作？我想将一些自述文件复制到 /docs文件夹(:= 同一个 repo，不是远程的!)，它们将根据它们的 frontmatter 重命名
PowerShell CSV 操作
我有一个 .csv 文件，其中第一列包含用户名。它们采用 FirstName LastName 的形式。我想获取 FirstName 并将 LastName 的第一个字符添加到它上面，然后删除空格。然
Sitecore - 操作 URL
Sitecore 根据 Sitecore 树中定义的项目名称生成 URL， http://samplewebsite/Pages/Sample Page 但我们的客户有兴趣降低所有 URL(页面/示例
单击按钮时的 Angularjs 操作
我正在尝试进行一些计算，但是一旦我输入金额，它就会完成。我只是希望通过单击按钮而不是自动发生这种情况。到目前为止我做了什么: Angular JS - programming-fr
将文件从一个存储库复制到另一个存储库的 github 操作
我的公司创建了一种在环境之间移动文件的复杂方法，现在我们希望将某些构建的 JS 文件(已转换和缩小)从一个 github 存储库移动到另一个。使用 github 操作可以实现这一点吗？最佳答案最简
java - JSONArray 操作
在我的代码中，我创建了一个 JSONArray 对象。并向 JSONArray 对象添加了两个 JSONObject。我使用的是 json-simple-1.1.jar。我的代码是 package j

首页

博学

6Ren·AI

商城

perl - 如何解析文件、创建记录并对记录执行操作，包括术语频率和距离计算