gpt4 book ai didi

parsing - 如何使用PARSE方言从CSV中读取行?

转载 作者:行者123 更新时间:2023-12-04 18:11:23 30 4
gpt4 key购买 nike

我正在尝试使用PARSE将CSV行变成Rebol块。足够容易用开放代码编写,但是与其他问题一样,我正在尝试学习方言可以做到这一点。

因此,如果一行显示:

"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com

然后我要块:
[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]

注意事项:
  • CSV字符串中的嵌入式引号用""指示
  • 逗号可以放在引号内,因此可以是文字的一部分,而不是列分隔符
  • 相邻的逗号分隔指示字段为空
  • 不包含引号或逗号的字符串可以不带引号出现
  • 目前,我们可以将http://rebol.com之类的内容保留为STRING!而不是LOAD将它们分为URL!
  • 之类的类型

    为了使它更加统一,我要做的第一件事是在输入行上附加一个逗号。然后我有一个 column-rule,它捕获由逗号终止的单列...它可以用引号引起来,也可以不用引号引起来。

    我知道由于标题行而应有多少列,因此代码如下:
    unless parse line compose [(column-count) column-rule] [
    print rejoin [{Expected } column-count { columns.}]
    ]

    但是我在写 column-rule时有些卡住。我需要一种方言表达方式:“一旦找到报价,就不断跳过报价对,直到找到一个独立存在的报价。”有什么好方法吗?

    最佳答案

    与大多数解析问题一样,我尝试构建一种最能描述输入格式元素的语法。

    在这种情况下,我们有名词:

    [comma ending value-chars qmark quoted-chars value header row]

    一些动词:
    [row-feed emit-value]

    和操作名词:
    [current chunk current-row width]

    我想我可以将其分解一些,但足以使用。一,基础:
    comma: ","
    ending: "^/"
    qmark: {"}
    value-chars: complement charset reduce [qmark comma ending]
    quoted-chars: complement charset reduce [qmark]

    现在的值(value)结构。引用的值是从我们发现的有效字符或引号的大块中建立起来的:
    current: chunk: none
    quoted-value: [
    qmark (current: copy "")
    any [
    copy chunk some quoted-chars (append current chunk)
    |
    qmark qmark (append current qmark)
    ]
    qmark
    ]

    value: [
    copy current some value-chars
    | quoted-value
    ]

    emit-value: [
    (
    delimiter: comma
    append current-row current
    )
    ]

    emit-none: [
    (
    delimiter: comma
    append current-row none
    )
    ]

    请注意,在每行的开头将 delimiter设置为 ending,然后在我们传递值后立即将其更改为 comma。因此,将输入行定义为 [ending value any [comma value]]

    剩下的就是定义文档结构:
    current-row: none
    row-feed: [
    (
    delimiter: ending
    append/only out current-row: copy []
    )
    ]

    width: none
    header: [
    (out: copy [])
    row-feed any [
    value comma
    emit-value
    ]
    value body: ending :body
    emit-value
    (width: length? current-row)
    ]

    row: [
    row-feed width [
    delimiter [
    value emit-value
    | emit-none
    ]
    ]
    ]

    if parse/all stream [header some row opt ending][out]

    将其包装起来以屏蔽所有这些单词,您将拥有:
    REBOL [
    Title: "CSV Parser"
    Date: 19-Nov-2012
    Author: "Christopher Ross-Gill"
    ]

    parse-csv: use [
    comma ending delimiter value-chars qmark quoted-chars
    value quoted-value header row
    row-feed emit-value emit-none
    out current current-row width
    ][
    comma: ","
    ending: "^/"
    qmark: {"}
    value-chars: complement charset reduce [qmark comma ending]
    quoted-chars: complement charset reduce [qmark]

    current: none
    quoted-value: use [chunk][
    [
    qmark (current: copy "")
    any [
    copy chunk some quoted-chars (append current chunk)
    |
    qmark qmark (append current qmark)
    ]
    qmark
    ]
    ]

    value: [
    copy current some value-chars
    | quoted-value
    ]

    current-row: none
    row-feed: [
    (
    delimiter: ending
    append/only out current-row: copy []
    )
    ]
    emit-value: [
    (
    delimiter: comma
    append current-row current
    )
    ]
    emit-none: [
    (
    delimiter: comma
    append current-row none
    )
    ]

    width: none
    header: [
    (out: copy [])
    row-feed any [
    value comma
    emit-value
    ]
    value body: ending :body
    emit-value
    (width: length? current-row)
    ]

    row: [
    opt ending end break
    |
    row-feed width [
    delimiter [
    value emit-value
    | emit-none
    ]
    ]
    ]

    func [stream [string!]][
    if parse/all stream [header some row][out]
    ]
    ]

    关于parsing - 如何使用PARSE方言从CSV中读取行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13451026/

    30 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com