gpt4 book ai didi

ruby-on-rails - 我如何在两个 CSV 文件中找到相似的行?

转载 作者:太空宇宙 更新时间:2023-11-03 18:27:24 24 4
gpt4 key购买 nike

这是我的代码,但对于大文件来说它需要很长时间:

require 'rubygems'
require "faster_csv"

fname1 =ARGV[0]
fname2 =ARGV[1]
if ARGV.size!=2
puts "Display common lines in the two files \n Usage : ruby user_in_both_files.rb <file1> <file2> "
exit 0
end

puts "loading the CSV files ..."
file1=FasterCSV.read(fname1, :headers => :first_row)
file2=FasterCSV.read(fname2, :headers => :first_row)
puts "CSV files loaded"

#puts file2[219808].to_s.strip.gsub(/\s+/,'')

lineN1=0
lineN2=0
# count how many common lines
similarLines=0
file1.each do |line1|
lineN1=lineN1+1
#compare line 1 to all line from file 2
lineN2=0
file2.each do |line2|
puts "file1:l#{lineN1}|file2:l#{lineN2}"
lineN2=lineN2+1
if ( line1.to_s.strip.gsub(/\s+/,'') == line2.to_s.strip.gsub(/\s+/,'') )
puts "file1:l#{line1}|file2:l#{line2}->#{line1}\n"
similarLines=similarLines+1
end
end
end
puts "#{similarLines} similar lines."

最佳答案

Ruby 对数组提供了集合操作:

a_ary = [1,2,3]
b_ary = [3,4,5]
a_ary & b_ary # => 3

因此,您应该尝试:

puts "loading the CSV files ..."
file1 = FasterCSV.read(fname1, :headers => :first_row)
file2 = FasterCSV.read(fname2, :headers => :first_row)
puts "CSV files loaded"

common_lines = file1 & file2
puts common_lines.size

如果您需要预处理数组,请在加载它们时执行:

file1 = FasterCSV.read(fname1, :headers => :first_row).map{ |l| l.to_s.strip.gsub(/\s+/, '') }
file2 = FasterCSV.read(fname2, :headers => :first_row).map{ |l| l.to_s.strip.gsub(/\s+/, '') }

关于ruby-on-rails - 我如何在两个 CSV 文件中找到相似的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9101691/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com