gpt4 book ai didi

python - 如何使用列类型将pandas数据框插入elasticsearch?

转载 作者:行者123 更新时间:2023-12-03 00:40:27 25 4
gpt4 key购买 nike

我想将 Pandas 数据帧索引到 elasticsearch 服务器中。我的一列是时间戳,其中一些是数字,一些是字符串。如何在Elasticsearch中导入此类数据框。我知道可以使用 _bulk API ,但我不知道该怎么做?

import pandas as pd
df = pd.read_csv('week1_features.csv',index_col=0)
df.head()

<html>
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>srcIp</th>
<th>collectionTimestamp</th>
<th>destinationBytes</th>
<th>destinationPackets</th>
<th>sourceBytes</th>
<th>sourcePackets</th>
<th>hour</th>
<th>WeekDay</th>
<th>FlowNumber</th>
<th>dstPort</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1.180.189.18</td>
<td>2017-04-12 12:08:00</td>
<td>0.0</td>
<td>0.0</td>
<td>60.0</td>
<td>1.0</td>
<td>12</td>
<td>3</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<th>1</th>
<td>1.180.189.18</td>
<td>2017-04-12 12:08:30</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>0.0</td>
<td>12</td>
<td>3</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<th>2</th>
<td>1.186.141.30</td>
<td>2017-04-12 07:26:00</td>
<td>0.0</td>
<td>0.0</td>
<td>60.0</td>
<td>1.0</td>
<td>7</td>
<td>3</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>1.191.82.68</td>
<td>2017-04-13 03:05:00</td>
<td>0.0</td>
<td>0.0</td>
<td>60.0</td>
<td>1.0</td>
<td>3</td>
<td>4</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<th>4</th>
<td>1.214.141.149</td>
<td>2017-04-10 04:19:30</td>
<td>0.0</td>
<td>0.0</td>
<td>136.0</td>
<td>1.0</td>
<td>4</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
</div>

</html>

最佳答案

通过此功能,您可以轻松地将pandas数据框插入elasticsearch。但是对于时间列,您必须在插入数据框之前将映射应用于时间fieldName。

def insertDataframeIntoElastic(dataFrame,index='index', typ = 'test', server = 'http://192.168.11.148:9200',
chunk_size = 2000):
headers = {'content-type': 'application/x-ndjson', 'Accept-Charset': 'UTF-8'}
records = dataFrame.to_dict(orient='records')
actions = ["""{ "index" : { "_index" : "%s", "_type" : "%s"} }\n""" % (index, typ) +json.dumps(records[j])
for j in range(len(records))]
i=0
while i<len(actions):
serverAPI = server + '/_bulk'
data='\n'.join(actions[i:min([i+chunk_size,len(actions)])])
data = data + '\n'
r = requests.post(serverAPI, data = data, headers=headers)
print r.content
i = i+chunk_size

关于python - 如何使用列类型将pandas数据框插入elasticsearch?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44197916/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com