gpt4 book ai didi

php - 过滤表格的正则表达式

转载 作者:可可西里 更新时间:2023-11-01 12:53:27 25 4
gpt4 key购买 nike

好吧,我有一个由一些开源软件输出的表格,但它没有以实际的表格格式输出,例如

<table> 
<thead>
<td>Heading</td>
<thead>
<tbody>
<tr>
<td>Content</td>
</tr>
<tbody>
</table

相反,开发软件的人认为像这样输出表格是个好主意

+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS AGENTS:21 | total| total| total| total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+

所以我无法构建一个网络抓取器来获取数据,或者我不确定我是否可以构建一个抓取器来抓取数据,因为它全部包含在一个 <pre> </pre> 中。标签 。因此,我一直在尝试使用 ruby​​ 和 Regex 来尝试完成工作,到目前为止我已经设法获得所有领先的 |出来了,我也设法得到标题 +-------+-----但仅此而已,因为我似乎必须一直重复该模式,它不想重复自己,好吧,但现在已经说得够多了这是我到目前为止使用的代码

text.lines.to_a.each do |line|
line.sub(/^\| |^\+*-*\+*\-*/) do |match|
puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end

例如,第一行的输出只会是 +-----------------+----------它是 CSV 格式,所以我使用 Gsub替换剩余的 |,

我可以使用 PHP 或 Ruby,所以非常欢迎任何答案

最佳答案

这可能不是那么干净,但它适用于这个例子:) ruby :

@text = <<END
+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS AGENTS:21 | total| total| total| total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+
END
s = @text.scan(/^[|]\W(.*)[|]$/)
puts s
arr = []
arr2 = []
s.each do |o|
a = o.to_s.split('|')
a.each do |oo|
arr2 << oo.to_s.gsub('["','').gsub('"]','').gsub(/\s+/, "")
end
arr << arr2
arr2 = []
end
arr.each do |i|
puts i
end

关于php - 过滤表格的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15083241/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com