gpt4 book ai didi

python - 如何对json文件进行切片,只提取部分字段

转载 作者:太空宇宙 更新时间:2023-11-03 17:19:27 26 4
gpt4 key购买 nike

我正在尝试对 json 文件进行切片,该文件如下所示:

{"price": 17.95, "categories": [["Musical Instruments", "Instrument Accessories", "General Accessories", "Sheet Music Folders"]], "imUrl": "http://ecx.images-amazon.com/images/I/41EpRmh8MEL._SY300_.jpg", "title": "Six Sonatas For Two Flutes Or Violins, Volume 2 (#4-6)", "salesRank": {"Musical Instruments": 207315}, "asin": "0006428320"}
{"description": "Composer: J.S. Bach.Peters Edition.For two violins and pianos.", "related": {"also_viewed": ["B0058DK7RA"], "buy_after_viewing": ["B0058DK7RA"]}, "categories": [["Musical Instruments"]], "brand": "", "imUrl": "http://ecx.images-amazon.com/images/I/41m6ygCqc8L._SY300_.jpg", "title": "Double Concerto in D Minor By Johann Sebastian Bach. Edited By David Oistrach. For Violin I, Violin Ii and Piano Accompaniment. Urtext. Baroque. Medium. Set of Performance Parts. Solo Parts, Piano Reduction and Introductory Text. BWV 1043.", "salesRank": {"Musical Instruments": 94593}, "asin": "0014072149", "price": 18.77}
{"asin": "0041291905", "categories": [["Musical Instruments", "Instrument Accessories", "General Accessories", "Sheet Music Folders"]], "imUrl": "http://ecx.images-amazon.com/images/I/41maAqSO9hL._SY300_.jpg", "title": "Hal Leonard Vivaldi Four Seasons for Piano (Original Italian Text)", "salesRank": {"Musical Instruments": 222972}, "description": "Vivaldi's famous set of four violin concertos certainly ranks among the all-time top ten classical favorites. Features include an introduction about the history of The Four Seasons and Vivaldi's original vivid Italian score markings. A must for classical purists."}

你可以看到所有行中的字段并没有严格排列,我只需要部分字段。所以我写了这段代码:

import json, csv

infile = open("sample_output.strict", "r")
outfile = open("output.csv", "w")
writer = csv.writer(outfile)

fileds = ["asin","price"]
for product in json.loads(infile.read()):
line = []
for f in fields:
if product.has_key(f):
line.append(product[f])
else:
line.append("")
writer.write(line)

我收到以下错误消息:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-3e335b184eea> in <module>()
6
7 fileds = ["asin","price"]
----> 8 for product in json.loads(infile.read()):
9 line = []
10 for f in fields:

C:\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
316 parse_int is None and parse_float is None and
317 parse_constant is None and object_pairs_hook is None and not kw):
--> 318 return _default_decoder.decode(s)
319 if cls is None:
320 cls = JSONDecoder

C:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
344 end = _w(s, end).end()
345 if end != len(s):
--> 346 raise ValueError(errmsg("Extra data", s, end, len(s)))
347 return obj
348

ValueError: Extra data: line 2 column 1 - line 3 column 617 (char 339 - 1581)

最佳答案

您拥有的是 json 行,而不是单个 json 文档。更改程序以读取每一行并将其转换为 json,然后以这种方式查看每个文档。这实际上很常见,我一直接收以这种格式加载的数据。

如果您最终要处理大文件,那么逐行执行会节省大量内存。

import json, csv

with open("sample_output.strict", "r") as infile:
with open("output.csv", "w") as outfile:
writer = csv.writer(outfile)

fields = ["asin","price"]
for json_line in infile:
product = json.loads(json_line)
line = []
for f in fields:
if product.has_key(f):
line.append(product[f])
else:
line.append("")
writer.writerow(line)

关于python - 如何对json文件进行切片,只提取部分字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33325564/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com