gpt4 book ai didi

python - MongoDB 集合到 pandas Dataframe

转载 作者:行者123 更新时间:2023-12-05 09:03:45 25 4
gpt4 key购买 nike

我的MongoDB文档结构如下,部分因子为NaN。

  _id :ObjectId("5feddb959297bb2625db1450")
factors: Array
0:Object
factorId:"C24"
Index:0
weight:1
1:Object
factorId:"C25"
Index:1
weight:1
2:Object
factorId:"C26"
Index:2
weight:1
name:"Growth Led Momentum"

我想使用 pymongo 和 pandas 将它转换为 pandas 数据框,如下所示。

|name                   | factorId | Index | weight|
----------------------------------------------------
|Growth Led Momentum | C24 | 0 | 0 |
----------------------------------------------------
|Growth Led Momentum | C25 | 1 | 0 |
----------------------------------------------------
|Growth Led Momentum | C26 | 2 | 0 |
----------------------------------------------------

谢谢

最佳答案

更新

我破解了 ol Python 来破解它 - 下面的代码可以完美运行!

from pymongo import MongoClient
import pandas as pd

uri = "mongodb://<your_mongo_uri>:27017"
database_name = "<your_database_name"
collection_name = "<your_collection_name>"

mongo_client = MongoClient(uri)
database = mongo_client[database_name]
collection = database[collection_name]

# I used this code to insert a doc into a test collection
# before querying (just incase you wanted to know lol)
"""
data = {
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
"factorId": "C24",
"index": 0,
"weight": 1
},
{
"factorId": "D74",
"index": 7,
"weight": 9
}
]
}

insert_result = collection.insert_one(data)
print(insert_result)
"""

# This is the query that
# answers your question

results = collection.aggregate([
{
"$unwind": "$factors"
},
{
"$project": {
"_id": 1, # Change to 0 if you wish to ignore "_id" field.
"name": 1,
"factorId": "$factors.factorId",
"index": "$factors.index",
"weight": "$factors.weight"
}
}
])

# This is how we turn the results into a DataFrame.
# We can simply pass `list(results)` into `DataFrame(..)`,
# due to how our query works.

results_as_dataframe = pd.DataFrame(list(results))
print(results_as_dataframe)

哪些输出:

   _id                  name factorId  index  weight
0 1 Growth Lead Momentum C24 0 1
1 1 Growth Lead Momentum D74 7 9

原始答案

您可以使用聚合管道展开因素,然后投影您想要的字段。

像这样的东西应该可以解决问题。

直播demo here .

数据库结构

[
{
"_id": 1,
"name": "Growth Lead Momentum",
"factors": [
{
factorId: "C24",
index: 0,
weight: 1
},
{
factorId: "D74",
index: 7,
weight: 9
}
]
}
]

查询

db.collection.aggregate([
{
$unwind: "$factors"
},
{
$project: {
_id: 1,
name: 1,
factorId: "$factors.factorId",
index: "$factors.index",
weight: "$factors.weight"
}
}
])

结果

(.csv 友好)

[
{
"_id": 1,
"factorId": "C24",
"index": 0,
"name": "Growth Lead Momentum",
"weight": 1
},
{
"_id": 1,
"factorId": "D74",
"index": 7,
"name": "Growth Lead Momentum",
"weight": 9
}
]

关于python - MongoDB 集合到 pandas Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69549147/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com