gpt4 book ai didi

ruby - 解释这个原始文本 - 一种策略?

转载 作者:数据小太阳 更新时间:2023-10-29 06:55:15 25 4
gpt4 key购买 nike

我有这个原始文本:

________________________________________________________________________________________________________________________________
Pos Car Competitor/Team Driver Vehicle Cap CL Laps Race.Time Fastest...Lap

1 6 Jason Clements Jason Clements BMW M3 3200 10 9:48.5710 3 0:57.3228*
2 42 David Skillender David Skillender Holden VS Commodore 6000 10 9:55.6866 2 0:57.9409
3 37 Bruce Cook Bruce Cook Ford Escort 3759 10 9:56.4388 4 0:58.3359
4 18 Troy Marinelli Troy Marinelli Nissan Silvia 3396 10 9:56.7758 2 0:58.4443
5 75 Anthony Gilbertson Anthony Gilbertson BMW M3 3200 10 10:02.5842 3 0:58.9336
6 26 Trent Purcell Trent Purcell Mazda RX7 2354 10 10:07.6285 4 0:59.0546
7 12 Scott Hunter Scott Hunter Toyota Corolla 2000 10 10:11.3722 5 0:59.8921
8 91 Graeme Wilkinson Graeme Wilkinson Ford Escort 2000 10 10:13.4114 5 1:00.2175
9 7 Justin Wade Justin Wade BMW M3 4000 10 10:18.2020 9 1:00.8969
10 55 Greg Craig Grag Craig Toyota Corolla 1840 10 10:18.9956 7 1:00.7905
11 46 Kyle Orgam-Moore Kyle Organ-Moore Holden VS Commodore 6000 10 10:30.0179 3 1:01.6741
12 39 Uptiles Strathpine Trent Spencer BMW Mini Cooper S 1500 10 10:40.1436 2 1:02.2728
13 177 Mark Hyde Mark Hyde Ford Escort 1993 10 10:49.5920 2 1:03.8069
14 34 Peter Draheim Peter Draheim Mazda RX3 2600 10 10:50.8159 10 1:03.4396
15 5 Scott Douglas Scott Douglas Datsun 1200 1998 9 9:48.7808 3 1:01.5371
16 72 Paul Redman Paul Redman Ford Focus 2lt 9 10:11.3707 2 1:05.8729
17 8 Matthew Speakman Matthew Speakman Toyota Celica 1600 9 10:16.3159 3 1:05.9117
18 74 Lucas Easton Lucas Easton Toyota Celica 1600 9 10:16.8050 6 1:06.0748
19 77 Dean Fuller Dean Fuller Mitsubishi Sigma 2600 9 10:25.2877 3 1:07.3991
20 16 Brett Batterby Brett Batterby Toyota Corolla 1600 9 10:29.9127 4 1:07.8420
21 95 Ross Hurford Ross Hurford Toyota Corolla 1600 8 9:57.5297 2 1:12.2672
DNF 13 Charles Wright Charles Wright BMW 325i 2700 9 9:47.9888 7 1:03.2808
DNF 20 Shane Satchwell Shane Satchwell Datsun 1200 Coupe 1998 1 1:05.9100 1 1:05.9100

Fastest Lap Av.Speed Is 152kph, Race Av.Speed Is 148kph
R=under lap record by greatest margin, r=under lap record, *=fastest lap time
________________________________________________________________________________________________________________________________
Issue# 2 - Printed Sat May 26 15:43:31 2012 Timing System By NATSOFT (03)63431311 www.natsoft.com.au/results
Amended

我需要将它解析为一个对象,其中包含明显的 Position、Car、Driver 等字段。问题是我不知道使用哪种策略。如果我在空白处拆分它,我最终会得到这样一个列表:

["1", "6", "Jason", "Clements", "Jason", "Clements", "BMW", "M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"]

你能看到这个问题吗?我不能只解释这个列表,因为人们可能只有 1 个名字,或者名字中有 3 个词,或者汽车中有许多不同的词。这使得仅使用索引来引用列表变得不可能。

使用列名定义的偏移量怎么样?不过,我不太明白如何使用它。

编辑:所以我目前使用的算法是这样工作的:

  1. 在新行上拆分文本,给出一组行。
  2. 找出每行最右边的常见空白字符。 IE。每行的位置(索引)行包含空格。例如:
  3. 根据这些常用字符拆分行。
  4. 修剪线条

存在几个问题:

如果名称包含相同的长度,如下所示:

Jason Adams
Bobby Sacka
Jerry Louis

然后它会将其解释为两个单独的项目:(["Jason""Adams", "Bobby", "Sacka", "Jerry", "Louis"]).

而如果它们都像这样不同:

Dominic Bou
Bob Adams
Jerry Seinfeld

然后它会在 Seinfeld 的最后一个 'd' 处正确拆分(因此我们会得到三个名字的集合(["Dominic Bou", "Bob Adams", "Jerry Seinfeld"]).

它也很脆弱。我正在寻找更好的解决方案。

最佳答案

这不是正则表达式的好例子,你真的想发现格式然后解压行:

lines = str.split "\n"

# you know the field names so you can use them to find the column positions
fields = ['Pos', 'Car', 'Competitor/Team', 'Driver', 'Vehicle', 'Cap', 'CL Laps', 'Race.Time', 'Fastest...Lap']
header = lines.shift until header =~ /^Pos/
positions = fields.map{|f| header.index f}

# use that to construct an unpack format string
format = 1.upto(positions.length-1).map{|x| "A#{positions[x] - positions[x-1]}"}.join
# A4A5A31A25A21A6A12A10

lines.each do |line|
next unless line =~ /^(\d|DNF)/ # skip lines you're not interested in
data = line.unpack(format).map{|x| x.strip}
puts data.join(', ')
# or better yet...
car = Hash[fields.zip data]
puts car['Driver']
end

关于ruby - 解释这个原始文本 - 一种策略?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10791337/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com