gpt4 book ai didi

python - 使用pandas进行json文档过滤

转载 作者:太空宇宙 更新时间:2023-11-03 14:38:18 25 4
gpt4 key购买 nike

假设有一个json文档:{“glossary”:{“GlossDiv”:{“GlossList”:{“GlossEntry”:{“GlossDef”:{“GlossSeeAlso”:[“XML”,”XLS”]}}}}}}

如果我使用 pandas.io.normalize 并将其展平为数据帧结构。之后,如果我想搜索数据帧是否有任何与 json 查询匹配的行,例如:{“glossary”:{“GlossDiv”:{“GlossList”:{“GlossEntry”:{“GlossDef”:{“GlossSeeAlso”:[“XLS”]}}}}}}

文件1.json:

[{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
},
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML","DSG"]
},
"GlossSee": "markup"
}
}
}
}
}]

文件2.json:

{“glossary":{“GlossDiv":{“GlossList":{“GlossEntry":{“GlossDef":{“GlossSeeAlso”:[“DSG"]}}}}}}

预计输出 1 行。

我将如何做同样的事情?假设 file1.json 有多个记录,必须根据 file2.json 中存在的单个 json 记录进行过滤

import pandas as pd
from pandas.io.json import json_normalize
file1=open('file1.json')
file2=open('file2.json')
records = json.load(file1)
df = json_normalize(records)

filter_record=json.load(file2)
#Need to filter df such that all the rows satisfy column values in filter_record
# Code :

最佳答案

用途:

df1 = json_normalize(a)
#print(df1)

df2 = json_normalize(b)
#print(df2)

#filter columns from df2 if contains df1
df = df1[df2.columns.intersection(df1.columns)]
#print (df)

#create sets
a = np.array([set(x) for x in df.iloc[:, 0].tolist()])
b = np.array([set(x) for x in df2.iloc[:, 0].tolist()])
print (a)
[{'XML', 'GML'} {'XML', 'DSG', 'GML'}]
print (b)
[{'DSG'}]

#testing match
matches = (b[:, None] <= a)
print (matches)
[[False True]]

#flatenning
any_ = matches[0]

#test if not NaNs
nul_ = df.iloc[:, 0].notnull().values
mask = any_ & nul_
print (mask)
[False True]
#boolean indexing
df1 = df1[mask]
<小时/>
print (df1)

glossary.GlossDiv.GlossList.GlossEntry.Abbrev \
1 ISO 8879:1986

glossary.GlossDiv.GlossList.GlossEntry.Acronym \
1 SGML

glossary.GlossDiv.GlossList.GlossEntry.GlossDef.GlossSeeAlso \
1 [GML, XML, DSG]

glossary.GlossDiv.GlossList.GlossEntry.GlossDef.para \
1 A meta-markup language, used to create markup ...

glossary.GlossDiv.GlossList.GlossEntry.GlossSee \
1 markup

glossary.GlossDiv.GlossList.GlossEntry.GlossTerm \
1 Standard Generalized Markup Language

glossary.GlossDiv.GlossList.GlossEntry.ID \
1 SGML

glossary.GlossDiv.GlossList.GlossEntry.SortAs glossary.GlossDiv.title \
1 SGML S

glossary.title
1 example glossary

关于python - 使用pandas进行json文档过滤,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46748739/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com