gpt4 book ai didi

Python - 从嵌套 json 中提取数据时出现问题

转载 作者:太空宇宙 更新时间:2023-11-03 20:27:47 25 4
gpt4 key购买 nike

我在从 json 中提取数据时遇到问题,我尝试了多种不同的方法。我能够提取 ID 本身,遗憾的是我无法显示该字段的详细信息。
下面是我的 json

{
"params": {
"cid": "15482782896",
"datemax": "20190831",
"datemin": "20190601",
"domains": [
"url.com"
],

},
"results": {
"59107": {
"url.com": {
"1946592": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 21,
"url": "url3.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 4,
"url": "url3.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 2,
"url": "url3.com"
}
}
}
}
},
"keyword": {
"title": "python_1",
"volume": 10
}
},
"1946602": {
"data": {
"2019-06-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 5,
"url": "url1.com"
}
}
}
},
"2019-07-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 12,
"url": "url1.com"
}
}
}
},
"2019-08-01": {
"ENGINE": {
"DEVICE": {
"": {
"position": 10.25,
"url": "url1.com"
}
}
}
}
},
"keyword": {
"title": "python_2",
"volume": 20
}
}
}
}
}
}

我尝试了以下代码,但我得到了 id 本身形式的结果

import json
import csv

def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}


with open('me_filename') as f_input:
json_data = json.load(f_input)

fieldnames = set()

for entry in json_data:
fieldnames.update(get_leaves(entry).keys())

with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)

我也尝试过使用pandas,但也未能正确解析

import io
import json
import pandas as pd

with open('me_filename', encoding='utf-8') as f_input:
df = pd.read_json(f_input , orient='None')

df.to_csv('output.csv', encoding='utf-8')

我需要得到的结果:

ID      Name         page   volume  url      2019-06-01 2019-07-01  2019-08-01  2019-09-01
1946592 python_1 url.com 10 url3.com 21 4 2 null
1946602 python_2 url.com 20 url1.com 5 12 10,25 null

我可能做错了什么?

最佳答案

嗯,这是一个有点复杂的解决方案,它看起来非常困惑,并且看起来不再像提供的代码,但我相信它会解决您的问题。

首先,我对提供的 Json 遇到了问题(由于第 8 行尾随“,”),但我成功生成了:

输出(temp.csv)

ID,Name,Page,Volume,Url,2019-08-01,2019-07-01,2019-06-01,
1946592,python_1,url.com,10,url3.com,2,4,21,
1946602,python_2,url.com,20,url1.com,10.25,12,5,

使用以下内容:

import json

dates: set = set()

# Collect the data
def get_breakdown(json):
collected_data = []
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
data_struct = {
'ID': _id,
'Name': json['results'][result][page][_id]['keyword']['title'],
'Page': page,
'Volume': json['results'][result][page][_id]['keyword']['volume'],
'Dates': {}
}
for date in dates:
if date in json['results'][result][page][_id]['data']:
data_struct['URL'] = json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['url']
data_struct['Dates'][date] = {'Position' : json['results'][result][page][_id]['data'][date]['ENGINE']['DEVICE']['']['position']}
else:
data_struct['Dates'][date] = {'Position' : 'null'}
collected_data.append(data_struct)
return collected_data

# Collect all dates across the whole data
# structure and save them to a set
def get_dates(json):
for result in json['results']:
for page in json['results'][result]:
for _id in json['results'][result][page]:
for date in json['results'][result][page][_id]['data']:
dates.add(date)


# Write to .csv file
def write_csv(collected_data, file_path):
f = open(file_path, "w")
# CSV Title
date_string = ''
for date in dates:
date_string = '{0}{1},'.format(date_string, date)
f.write('ID,Name,Page,Volume,Url,{0}\n'.format(date_string))

# Data
for data in collected_data:
position_string = ''
for date in dates:
position_string = '{0}{1},'.format(position_string, data['Dates'][date]['Position'])
f.write('{0},{1},{2},{3},{4},{5}\n'.format(
data['ID'],
data['Name'],
data['Page'],
data['Volume'],
data['URL'],
position_string
))

# Code Body
with open('me_filename.json') as f_input:
json_data = json.load(f_input)
get_dates(json_data)
write_csv(get_breakdown(json_data), "output.csv")

希望您能够按照代码进行操作,并且它可以实现预期的效果。我确信它可以变得更加可靠 - 但是正如前面提到的,我无法让它与您提供的基本代码一起工作。

关于Python - 从嵌套 json 中提取数据时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57705505/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com