gpt4 book ai didi

regex - 使用正则表达式从字符串中提取匹配模式并使用 perl 将其分配给变量

转载 作者:行者123 更新时间:2023-12-01 00:37:00 24 4
gpt4 key购买 nike

我正在寻求有关提取字符串部分的建议,这总是作为括号之间的第一个实例数据使用 perl 和正则表达式并将该值分配给变量。

这是确切的情况,我使用 perl 和 regex 从大学目录中提取 courseID 并将其分配给变量。请考虑以下事项:

  • BIO-2109-01 (12345) 生物学导论
  • CHM-3501-F2-01 (54321) 化学导论
  • IDS-3250-01 (98765) 美国历史 (1860-2000)
  • SPN-1234-02-F1 (45678) 西类牙历史 (1900-2010)

  • 典型的格式是 [course-section-name] [(courseID)] [courseName]

    我的目标是创建一个脚本,该脚本可以一次获取每个条目,将其分配给一个变量,然后使用正则表达式仅提取 courseID 并仅将 courseID 分配给变量。

    我的方法是使用搜索和替换来替换与 '' 不匹配的所有内容,然后将剩下的内容(类(class) ID)保存到变量中。以下是我尝试过的一些示例:
    $string = "BIO-2109-01 (12345) Introduction to Biology";
    ($courseID = $string) =~ s/[^\d\d\d\d\d]//g;
    print $courseID;

    结果:21090112345 --- 打印类(class)部分名称和类(class) ID
    $string = "BIO-2109-01 (12345) Introduction to Biology";
    $($courseID = $string) =~ s/[^\b\(\d{5}\)]\b//g;
    print $courseID;

    结果:210901(12345) --- 打印 course-section-name、parens 和 courseID

    所以我在搜索和替换方面运气不佳 - 但是我发现了这个金块:
    \(([^\)]+)\)

    http://regexr.com/ 上将匹配括号部分。但是,它也会匹配多个参数,包括例如 (abc)。

    我现在不太确定如何做这样的事情:
    $string = "BIO-2109-01 (12345) Introduction to Biology";
    ($courseID = $string) =~ [magicRegex_goes_here];
    print courseID;

    结果 12345

    或更好:
    $string = IDS-3250-01 (98765) History of US (1860-2000)
    ($courseID = $string) =~ [magicRegex_goes_here];
    print courseID;

    结果 98765

    任何建议或方向将不胜感激。我已经尝试了我所知道的一切,并且可以研究正则表达式来解决这个问题。如果有更多信息我可以包括请询问。

    更新
    use warnings 'all';
    use strict;
    use feature 'say';

    my $file = './data/enrollment.csv'; #File this script generates
    my $course = ""; #Complete course string [name-of-course] [(courseID)] [course_name]
    my @arrayCourses = ""; #Array of courseIDs
    my $i = ""; #i in for loop
    my $courseID = ""; #Extracted course ID
    my $userName = ""; #Username of person we are enrolling
    my $action = "add,"; #What we are doing to user
    my $permission = "teacher,"; #What permissions to assign to user
    my $stringToPrint = ""; #Concatinated string to write to file
    my $n = "\n"; #\n
    my $c = ","; #,

    #BEGIN PROGRAM

    print "Enter the username \n";

    chomp($userName = <STDIN>); #Get the enrollee username from user

    print "\n";

    print "Enter course name and press enter. Enter 'x' to end. \n"; #prompt for course names

    while ($course ne 'x') {
    chomp($course = <STDIN>);
    if ($course ne "x") {
    if (($courseID) = ($course =~ /[^(]+\(([^)]+)\)/) ) { #nasty regex to extract courseID - thnx PerlDuck and zdim
    push @arrayCourses, $courseID; #put the courseID into array
    }
    else {
    print "Cannot process last entry check it";
    }
    }
    else {
    last;
    }
    }

    shift @arrayCourses; #Remove first entry from array - add,teacher,,username

    open(my $fh,'>', $file); #open file

    for $i (@arrayCourses) #write array to file
    {
    $stringToPrint= join "", $action, $permission, $i, $c, $userName, $n ;
    print $fh $stringToPrint;
    }

    close $fh;

    就可以了!随时欢迎提出建议或改进!感谢@PerlDuck 和@zdim

    最佳答案

    #!/usr/bin/env perl

    use strict;
    use warnings;

    while( my $line = <DATA> ) {
    if (my ($courseID) = ($line =~ /[^(]+\(([^)]+)\)/) ) {
    print "course-ID = $courseID; -- line was $line";
    }
    }

    __DATA__
    BIO-2109-01 (12345) Introduction to Biology
    CHM-3501-F2-01 (54321) Introduction to Chemistry
    IDS-3250-01 (98765) History of US (1860-2000)
    SPN-1234-02-F1 (45678) Spanish History (1900-2010)

    输出:
    course-ID = 12345; -- line was BIO-2109-01 (12345) Introduction to Biology
    course-ID = 54321; -- line was CHM-3501-F2-01 (54321) Introduction to Chemistry
    course-ID = 98765; -- line was IDS-3250-01 (98765) History of US (1860-2000)
    course-ID = 45678; -- line was SPN-1234-02-F1 (45678) Spanish History (1900-2010)

    我使用的模式 /[^(]+\(([^)]+)\)/ 也可以写成
    / [^(]+     # 1 or more characters that are not a '('
    \( # a literal '('. You must escape that because you don't want
    # to start it a capture group.
    ([^)]+) # 1 or more chars that are not a ')'.
    # The sorrounding '(' and ')' capture this match
    \) # a literal ')'
    /x
    /x 修饰符允许您在模式中插入空格、注释甚至换行符。

    以防万一您不确定 /x 。你确实可以写:
    while( my $line = <DATA> ) {
    if (my ($courseID) = ($line =~ / [^(]+ # …
    \( # …
    ([^)]+) # …
    \) # …
    /x ) ) {
    print "course-ID = $courseID; -- line was $line";
    }
    }

    这可能是 不好读 但你也可以将正则表达式存储在 单独的变量 中:
    my $pattern = 
    qr/ [^(]+ # 1 or more characters that are not a '('
    \( # a literal '(' (you must escape it)
    ([^)]+) # 1 or more chars that are not a ')'.
    # The sorrounding '(' and ')' capture this match
    \) # a literal ')'
    /x;

    进而:
    if (my ($courseID) = ($line =~ $pattern)) {

    }

    关于regex - 使用正则表达式从字符串中提取匹配模式并使用 perl 将其分配给变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40271097/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com