gpt4 book ai didi

python - 在 python 中从 json 数组获取嵌套对象时遇到问题

转载 作者:行者123 更新时间:2023-12-04 04:09:48 29 4
gpt4 key购买 nike

你好,我有一个 jsonLines 文件,我试图从这里的 jsonline 文件中获取所有标签(以及应该是同一过程的提及):https://github.com/THsTestingGround/JsonL_Quest_SO/blob/master/output-2020-01-21.jsonl(所以不允许我放 url,而且有很多)

这是一个获取单个键对象的可重现示例。 我将如何继续获得多个主题标签(提及相同)?目前我必须手动指定。无论如何,一次性把它们全部搞定?我能够在这里使用此代码获取 csv:

import json
import csv
import io

# creates a .csv file using a Twitter .json file
# the fields have to be set manually

def extract_json(fileobj):

# Iterates over an open JSONL file and yields
# decoded lines. Closes the file once it has been
# read completely.

with fileobj:
for line in fileobj:
yield json.loads(line)

#path to the jsonl file
data_json = io.open('output-2020-01-21.json', mode='r', encoding='utf-8') # Opens in the JSONL file
data_python = extract_json(data_json)

csv_out = io.open('tweets_out_utf8.csv', mode='w', encoding='utf-8') #opens csv file

#if you're adding additional columns please don't forget to add them here
fields = u'created_at,text,full_text, screen_name,followers,friends,rt,fav' #field names
csv_out.write(fields)
csv_out.write(u'\n')

for line in data_python:

#because retweet is not common, sometimes jsonl won't have the key, so this is safer
try:
retweeted_status_full_text = '"' +line.get('retweeted_status').get('full_text').replace('"','""') + '"'
except:
retweeted_status_full_text = 'NA'
#gets me only one hastags even when there are more than one
try:
entities= '"' + line.get('entities').get('hashtags')[0].get('text').replace('"', '""') + '"'
except:
entities = 'NA'

#writes a row and gets the fields from the json object
#screen_name and followers/friends are found on the second level hence two get methods
row = [line.get('created_at'),
'"' + line.get('full_text').replace('"','""') + '"', #creates double quotes
retweeted_status_full_text,
line.get('user').get('screen_name'),
str(line.get('user').get('followers_count')),
str(line.get('user').get('friends_count')),
str(line.get('retweet_count')),
str(line.get('favorite_count'))]



row_joined = u','.join(row)
csv_out.write(row_joined)
csv_out.write(u'\n')

csv_out.close()

我确实尝试过,但它给了我一个错误。我似乎也无法在 SO 中找到解决方案。目前 json 有点弱,所以我会很感激我能得到的任何帮助。谢谢。

最佳答案


import json
import csv
import io

def extract_json(fileobj):
with fileobj:
for line in fileobj:
yield json.loads(line)

data_json = io.open('a.json', mode='r', encoding='utf-8')
data_python = extract_json(data_json)

csv_out = io.open('tweets_out_utf8.csv', mode='w', encoding='utf-8')

fields = u'created_at,text,full_text, screen_name,followers,friends,rt,fav'
csv_out.write(fields)
csv_out.write(u'\n')

for line in data_python:

try:
retweeted_status_full_text = '"' +line.get('retweeted_status').get('full_text').replace('"','""') + '"'
except:
retweeted_status_full_text = 'NA'

try:
temp = line.get('entities').get('hashtags')
entities = ""
for val in temp:
entities += '"' + val.get('text').replace('"', '""') + '"' + ' '
except:
entities = ""

row = [line.get('created_at'),
'"' + line.get('full_text').replace('"','""') + '"',
retweeted_status_full_text,
line.get('user').get('screen_name'),
str(line.get('user').get('followers_count')),
str(line.get('user').get('friends_count')),
str(line.get('retweet_count')),
str(line.get('favorite_count'))]

print('entities' + ' ' + str(entities))

row_joined = u','.join(row)
csv_out.write(row_joined)
csv_out.write(u'\n')

csv_out.close()

我试过这样的事情。我用 entities = '' 替换了空实体

关于python - 在 python 中从 json 数组获取嵌套对象时遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61938920/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com