gpt4 book ai didi

ruby - 解析具有不同分隔符的文本 - 带分组

转载 作者:太空宇宙 更新时间:2023-11-03 16:56:06 25 4
gpt4 key购买 nike

我正在编辑我之前的帖子,因为我取得了一些进展,但现在有点卡住了:

文本文件示例如下。我现在可以读取文件并进行一些解析以获取我需要的数据并输出文件。但是,输出将数据放在不同的行上,我需要将输出文件(名称、到期日期、last_used、address1、address2、city、state、zip)放在以逗号分隔的一行上。

到目前为止的代码如下:

def is_numeric?(object)
true if Float(object) rescue false
end



def load_file
raw_records = []
infile = File.open("testfile.txt", "r")
#counter =1

while line = infile.gets
possible_account_number = line[0,16]
if is_numeric?(possible_account_number)
account_number= possible_account_number[5,11]
name = line[21,27].strip.gsub(/\,/,"")
expire_date = line[108,8].strip
last_used = line[117,8].strip
line = infile.gets
line = infile.gets
address1 = line.strip.gsub(/\,/,"") #needed for some random commas
line = infile.gets
address2 = line.strip.gsub(/\,/,"")
line = infile.gets
city = line[21, 20].strip.gsub(/\,/,"")
state = line[42, 2]
zip = line[45, 5]
record = [name, expire_date,last_used, address1, address2, city, state, zip]
raw_records << record
#counter = counter + 1
end

end
infile.close
puts raw_records.map {|record| record*','}

File.open('test_w.txt', 'w') do |f2|
f2.puts raw_records.map {|record| record*','}
end


end

#the_string.gsub(/\,/,"")


load_file

这是原始数据:

11111 ABC MOVINGABC, INC                   1234567891 LISTINGS                 02-06-12  MONDAY             2112-001-001  PAGE     1      1234 CUSTOMIA ROAD  SUITE 12345      LIST MANAGEMENT      NOSAOLOS        NV 12345STATEMENTS TISSUE    STATEMENTS NAME 1                 ABC        TISSUES       TISSUE ROAD        LOC      TISSUES  PAGE ABC TISSUE                     STATEMENTS NAME 2                     ADDRESS LINE 1                     ADDRESS LINE 2                     CITY                 ST ZIPTITLE   TISSUE NUMBER: 1234567891234567890000030     MARILYN SMITH                  12345678911                                             05-30-12 01-28-12                     1234 ST MARYS BLVD.                     SUITE B                     NOSAOLOS             MI 123451234567890000048     MARILYN ACTIVITA               12345678911                                             05-30-12 09-04-11                     1234 ST MARYS BOULEVARD                     STE. B                     NOSAOLOS             OH 123451234567890000055     ANDREW WAYMENT                 12345678911                                             05-30-12 01-12-12                     123 S. DESCRIBE ST.                     NOSAOLOS             OH 12345

Here is the finished text- with help from Jason (thanks):

MARILYN SMITH,5-30-12 ,1-28-12,1234 ST MARYS BLVD.,SUITE B,NOSAOLOS,MI,12345MARILYN ACTIVITA,5-30-12 ,9-04-11,1234 ST MARYS BOULEVARD,STE. B,NOSAOLOS,OH,12345ANDREW WAYMENT,5-30-12 ,1-12-12,123 S. DESCRIBE ST.,,NOSAOLOS,OH,12345

I also wanted it save to a file and I used this:

File.open('test_w.txt', 'w') do |f2|   
f2.puts raw_records.map {|record| record*','}
end

安德鲁

最佳答案

如果没有输入文件,很难给您任何代码作为示例,但文件的图像看起来相当可预测,因此具有一些 RegExp 魔法的状态跟踪器应该可以解决问题。

该文件看起来是制表符分隔的,因此您可以按制表符拆分行:

File.open('filename', 'r') do |file|
lines = file.inject([]){|memo, line| memo.push line.split(/\t/)}
# Now you have an array of arrays that you can parse with a state tracker
end

您的状态跟踪器将简单地跟踪您最后输入的内容,例如号码、姓名或发布日期,然后填充正确的值。

关于ruby - 解析具有不同分隔符的文本 - 带分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9196199/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com