gpt4 book ai didi

python - 将 json.dumps 转换为 Python 数据帧

转载 作者:太空宇宙 更新时间:2023-11-03 21:28:36 25 4
gpt4 key购买 nike

我正在使用 IBM Watson 的自然语言理解 API。我使用 API 文档中的以下代码返回存储在 Dataframe 中的 Nike 一些推文的情绪分析:

import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 \
import Features, EntitiesOptions, KeywordsOptions

naturalLanguageUnderstanding = NaturalLanguageUnderstandingV1(
version='2018-09-21',
iam_apikey='[KEY HIDDEN]',
url='https://gateway.watsonplatform.net/natural-language-
understanding/api')

for tweet in nikedf["text"]:
response = naturalLanguageUnderstanding.analyze(
text=tweet,
features=Features(
entities=EntitiesOptions(
emotion=False,
sentiment=True,
limit=2),
keywords=KeywordsOptions(
emotion=False,
sentiment=True,
limit=2))).get_result()
print(json.dumps(response, indent=2))

我返回一个字符串 json 转储,如下所示。

{
"usage": {
"text_units": 1,
"text_characters": 140,
"features": 2
},
"language": "en",
"keywords": [
{
"text": "Kaepernick7 Kapernick",
"sentiment": {
"score": 0.951279,
"label": "positive"
},
"relevance": 0.965894,
"count": 1
},
{
"text": "campaign",
"sentiment": {
"score": 0.951279,
"label": "positive"
},
"relevance": 0.555759,
"count": 1
}
],
"entities": [
{
"type": "Company",
"text": "nike",
"sentiment": {
"score": 0.899838,
"label": "positive"
},
"relevance": 0.92465,
"disambiguation": {
"subtype": [],
"name": "Nike, Inc.",
"dbpedia_resource": "http://dbpedia.org/resource/Nike,_Inc."
},
"count": 2
},
{
"type": "Company",
"text": "Kapernick",
"sentiment": {
"score": 0.899838,
"label": "positive"
},
"relevance": 0.165888,
"count": 1
}
]
}
{
"usage": {
"text_units": 1,
"text_characters": 140,
"features": 2
},
"language": "en",
"keywords": [
{
"text": "ORIGINS PAY",
"sentiment": {
"score": 0.436905,
"label": "positive"
},
"relevance": 0.874857,
"count": 1
},
{
"text": "RT",
"sentiment": {
"score": 0.436905,
"label": "positive"
},
"relevance": 0.644407,
"count": 1
}
],
"entities": [
{
"type": "Company",
"text": "Nike",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.922792,
"disambiguation": {
"subtype": [],
"name": "Nike, Inc.",
"dbpedia_resource": "http://dbpedia.org/resource/Nike,_Inc."
},
"count": 1
},
{
"type": "TwitterHandle",
"text": "@IcySoleOnline",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.922792,
"count": 1
}
]
}
{
"usage": {
"text_units": 1,
"text_characters": 137,
"features": 2
},
"language": "en",
"keywords": [
{
"text": "RT",
"sentiment": {
"score": 0.946834,
"label": "positive"
},
"relevance": 0.911909,
"count": 2
},
{
"text": "SPOTS",
"sentiment": {
"score": 0.946834,
"label": "positive"
},
"relevance": 0.533273,
"count": 1
}
],
"entities": [
{
"type": "TwitterHandle",
"text": "@dropssupreme",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.01,
"count": 1
}
]
}
{
"usage": {
"text_units": 1,
"text_characters": 140,
"features": 2
},
"language": "en",
"keywords": [
{
"text": "Golden Touch' boots",
"sentiment": {
"score": 0,
"label": "neutral"
},
"relevance": 0.885418,
"count": 1
},
{
"text": "RT",
"sentiment": {
"score": 0,
"label": "neutral"
},
"relevance": 0.765005,
"count": 1
}
],
"entities": [
{
"type": "Company",
"text": "Nike",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.33,
"disambiguation": {
"subtype": [],
"name": "Nike, Inc.",
"dbpedia_resource": "http://dbpedia.org/resource/Nike,_Inc."
},
"count": 1
},
{
"type": "Person",
"text": "Luka Modri\u0107",
"sentiment": {
"score": 0.0,
"label": "neutral"
},
"relevance": 0.33,
"disambiguation": {
"subtype": [
"Athlete",
"FootballPlayer"
],
"name": "Luka Modri\u0107",
"dbpedia_resource": "http://dbpedia.org/resource/Luka_Modri\u0107"
},
"count": 1
}
]
}

如何将其转换为具有标题的数据框:文本、分数和标签(来自 json 转储)?

提前谢谢您!!

最佳答案

您的 json 文本将不容易解析。一种选择是收集列表中的响应并使用它来创建写入 json 并创建数据帧。

import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 \
import Features, EntitiesOptions, KeywordsOptions

naturalLanguageUnderstanding = NaturalLanguageUnderstandingV1(
version='2018-09-21',
iam_apikey='[KEY HIDDEN]',
url='https://gateway.watsonplatform.net/natural-language-understanding/api')

responses = []
for tweet in nikedf["text"]:
response = naturalLanguageUnderstanding.analyze(
text=tweet,
features=Features(
entities=EntitiesOptions(
emotion=False,
sentiment=True,
limit=2),
keywords=KeywordsOptions(
emotion=False,
sentiment=True,
limit=2))).get_result()
responses.append(response)

使用响应列表创建 rdd 并解析每一行以创建所需的列:

from pyspark.sql import Row

#Row: text, score, and label
def convert_to_row(response):
rows = []
for keyword in response['keywords']:
row_dict = {}
row_dict['text'] = keyword['text']
row_dict['score'] = keyword['sentiment']['score']
row_dict['label'] = keyword['sentiment']['label']
row = Row(**row_dict)
rows.append(row)
return rows

sc.parallelize(responses) \
.flatMap(convert_to_row) \
.toDF().show()

关于python - 将 json.dumps 转换为 Python 数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53687099/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com