gpt4 book ai didi

python - 使用Python将数据帧索引到Elasticsearch中

转载 作者:行者123 更新时间:2023-12-02 22:35:49 25 4
gpt4 key购买 nike

我正在尝试将一些 Pandas 数据框索引到ElasticSearch中。我在解析生成的json时遇到了一些麻烦。我认为我的问题来自映射。请在下面找到我的代码。

import logging
from pprint import pprint
from elasticsearch import Elasticsearch
import pandas as pd

def create_index(es_object, index_name):
created = False
# index settings
settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
"first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text",
},
}
}
}
}

try:
if not es_object.indices.exists(index_name):
#Ignore 400means to ignore "Index Already Exist" error
es_object.indices.create(index=index_name, ignore=400,
body=settings)
print('Created Index')
created = True
except Exception as ex:
print(str(ex))
finally:
return created


def store_record(elastic_object, index_name, record):
is_stored = True
try:
outcome = elastic_object.index(index=index_name,doc_type='danger', body=record)
print(outcome)
except Exception as ex:
print('Error in indexing data')


data = [['Hook', 'James','90', 'Austin','M'],['Sparrow','Jack','15', 'Paris', 'M'],['Kent','Clark','13', 'NYC', 'M'],['Montana','Hannah','28','Las Vegas', 'F'] ]
df = pd.DataFrame(data,columns=['name', 'first_name', 'age', 'city', 'sex'])
result = df.to_json(orient='records')
result = result[1:-1]
es = Elasticsearch()
if es is not None:
if create_index(es, 'cracra'):
out = store_record(es, 'cracra', result)
print('Data indexed successfully')

我收到以下错误
POST http://localhost:9200/cracra/danger [status:400 request:0.016s]

Error in indexing data
RequestError(400, 'mapper_parsing_exception', 'failed to parse')
Data indexed successfully

我不知道它从哪里来。如果有人可以帮助我解决这个问题,我将不胜感激。

非常感谢 !

最佳答案

尝试从映射中删除多余的逗号:

"mappings": {
"danger": {
"dynamic": "strict",
"properties": {
"name": {
"type": "text"
},
first_name": {
"type": "text"
},
"age": {
"type": "integer"
},
"city": {
"type": "text"
},
"sex": {
"type": "text", <-- here
}, <-- and here
}
}
}

更新

看来索引创建成功,问题出在数据索引上。正如Nishant Saini指出的那样,您可能正在尝试一次索引多个文档。可以使用 Bulk API完成。这是索引两个文档的正确请求的示例:
POST cracra/danger/_bulk
{"index": {"_id": 1}}
{"name": "Hook", "first_name": "James", "age": "90", "city": "Austin", "sex": "M"}
{"index": {"_id": 2}}
{"name": "Sparrow", "first_name": "Jack", "age": "15", "city": "Paris", "sex": "M"}

请求正文中的每个文档都必须在换行之前出现,并带有一些元信息。在这种情况下,metainfo仅包含必须分配给文档的ID。

您可以手动进行此查询,也可以对Python使用 Elasticsearch Helpers来解决添加正确的元信息的问题。

关于python - 使用Python将数据帧索引到Elasticsearch中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53583214/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com