gpt4 book ai didi

json - 我有一个凌乱的 JSON,我正在尝试使用 jq 清理它

转载 作者:行者123 更新时间:2023-12-04 02:00:23 25 4
gpt4 key购买 nike

我有一些凌乱的 JSON。

  • 某些节点跨行不一致。在某些行中,这些节点是数组,而在某些行中,这些节点是对象或字符串。
  • 这里的例子只有两层,但实际数据嵌套了更多层。

  • 例子:
    [
    {
    "id": 1,
    "person": {
    "addresses": {
    "address": {
    "city": "FL"
    }
    },
    "phones": [
    {
    "type": "mobile",
    "number": "555-555-5555"
    }
    ],
    "email": [
    {
    "type": "work",
    "email": "john.doe@gmail.com"
    },
    {
    "type": "work",
    "email": "john.doe@work.com"
    }
    ]
    }
    },
    {
    "id": 2,
    "person": {
    "addresses": [
    {
    "type": "home",
    "address": {
    "city": "FL"
    }
    }
    ],
    "phones": {
    "type": "mobile",
    "number": "555-555-5555"
    },
    "email": {
    "type": "work",
    "email": "jane.doe@gmail.com"
    }
    }
    }
    ]

    我想让节点保持一致,以便如果任何节点是任何节点中的数组,则其余节点应转换为数组。

    一旦数据一致,分析和重组数据就会更容易。

    预期结果:
    [
    {
    "id": 1,
    "person": {
    "addresses": [
    {
    "address": {
    "city": "FL"
    }
    }
    ],
    "phones": [
    {
    "type": "mobile",
    "number": "555-555-5555"
    }
    ],
    "email": [
    {
    "type": "work",
    "email": "john.doe@gmail.com"
    },
    {
    "type": "work",
    "email": "john.doe@work.com"
    }
    ]
    }
    },
    {
    "id": 2,
    "person": {
    "addresses": [
    {
    "type": "home",
    "address": {
    "city": "FL"
    }
    }
    ],
    "phones": [
    {
    "type": "mobile",
    "number": "555-555-5555"
    }
    ],
    "email": [
    {
    "type": "work",
    "email": "jane.doe@gmail.com"
    }
    ]
    }
    }
    ]

    使数组一致后,我想展平数据,以便对象展平,但阵列仍然是阵列。这个

    预期结果
    [
    {
    "id": 1,
    "person.addresses": [
    {
    "address": {
    "city": "FL"
    }
    }
    ],
    "person.phones": [
    {
    "type": "mobile",
    "number": "555-555-5555"
    }
    ],
    "person.email": [
    {
    "type": "work",
    "email": "john.doe@gmail.com"
    },
    {
    "type": "work",
    "email": "john.doe@work.com"
    }
    ]
    },
    {
    "id": 2,
    "person.addresses": [
    {
    "type": "home",
    "address": {
    "city": "FL"
    }
    }
    ],
    "person.phones": [
    {
    "type": "mobile",
    "number": "555-555-5555"
    }
    ],
    "person.email": [
    {
    "type": "work",
    "email": "jane.doe@gmail.com"
    }
    ]
    }
    ]

    我能够使用 jq 部分地做到这一点。当有一条或两条路径需要修复时,它会起作用,但是当路径超过两条时,它似乎会中断。

    我采取的方法
  • 确定所有可能的路径
  • 对每个路径的数据类型进行分组和计数
  • 识别存在混合数据类型的情况
  • 按深度递减对路径进行排序
  • 排除没有混合类型的路径
  • 排除其中一种混合类型不是数组的路径
  • 对于每条路径,对原始数据应用修复
  • 这将生成一个包含 N 个副本的流,每个 N 转换一个副本
  • 提取应包含清理结果的最后一个副本

  • My Experiment so far
    def fix(data; path):
    data |= map(. | getpath(path)?=([getpath(path)?]|flatten));

    def hist:
    length as $l
    | group_by (.)
    | map( .
    | (.|length) as $c
    | {(.[0]):{
    "count": $c,
    "diff": ($l - $c)
    }} )
    | (length>1) as $mixed
    | {
    "types": .[],
    "count": $l,
    "mixed":$mixed
    };

    def summary:
    map( .
    | path(..) as $p
    | {
    path:$p,
    type: getpath($p)|type,
    key:$p|join(".")
    }
    )
    | flatten
    | group_by(.key)
    | map( .
    | {
    key: .[0].key,
    path: .[0].path,
    depth: (.[0].path|length),
    type:([(.[] | .type)]|hist)
    }
    )
    | sort_by(.depth)
    | reverse;

    . as $data
    | .
    | summary
    | map( .
    | select(.type.mixed)
    | select(.type.types| keys| contains(["array"]))
    | .path)
    | map(. as $path | $data | fix($data;$path))
    | length as $l
    | .[$l-1]


    仅存在最后一次转换。我认为 $data 没有被我的修复更新,这可能是根本原因,或者我只是做错了。

    Here is e where this doesn't work

    最佳答案

    下面的响应首先解决了第一个任务,即:

    make the nodes consistent so that if any ... node is an array in any of the nodes, then the remaining nodes should be converted into arrays.



    以一种通用的方式:
    def paths_to_array:
    [paths as $path
    | select( any(.[]; (getpath($path[1:] )? | type) == "array"))
    | $path] ;

    # If a path to a value in .[] is an array,
    # then ensure all corresponding values are also arrays
    def make_uniform:
    reduce (paths_to_array[][1:]) as $path (.;
    map( (getpath($path)? // null) as $value
    | if $value and ($value|type != "array")
    then setpath($path; [$value])
    else . end ) ) ;

    make_uniform

    对于第二个任务,让我们定义一个效用函数:
    # Input is assumed to be an object:
    def flatten_top_level_keys:
    [ to_entries[]
    | if (.value|type) == "object"
    then .key as $k
    | (.value|to_entries)[] as $kv
    | {key: ($k + "." + $kv.key), value: $kv.value}
    else .
    end ]
    | from_entries;

    这可以与 walk/1 一起使用实现递归
    压平。

    换句话说,可以得到组合问题的解
    经过:
    make_uniform
    | walk( if type == "object" then flatten_top_level_keys else . end )

    效率
    make_uniform的上述定义该行存在明显的效率问题:
     reduce (paths_to_array[][1:]) as $path (.;  

    使用 jq 的 unique将是解决它的一种方法,但是 unique是使用排序来实现的,在这种情况下会引入另一个低效率。所以让我们用这个老栗子:
    # bag of words
    def bow(stream):
    reduce stream as $word ({}; .[$word|tostring] += 1);

    现在我们可以定义 make_uniform更有效率:
    def make_uniform:
    def uniques(s): bow(s) | keys_unsorted[] | fromjson;
    reduce uniques(paths_to_array[][1:]) as $path (.;
    map( (getpath($path)? // null) as $value
    | if $value and ($value|type != "array")
    then setpath($path; [$value])
    else . end ) ) ;

    关于json - 我有一个凌乱的 JSON,我正在尝试使用 jq 清理它,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57133421/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com