gpt4 book ai didi

python - 在aws中获取表的列名作为元组而不是字符串

转载 作者:行者123 更新时间:2023-11-29 18:07:45 25 4
gpt4 key购买 nike

我正在尝试使用 lambda 函数将数据从 s3 存储桶插入到 aws 中的 mysql RDS 实例。我已使用 sqlalchemy 连接到 mysql 端点。我想对数据进行一些修改。我更改了列名称,然后重新索引它们,以便可以将它们映射到 RDS 实例中的表。问题出在 df.columns 行中。我没有以字符串格式获取列名称,而是将它们作为元组获取。

+-----------------+-------------+----------------------+---------------+---------
| ('col_a',) | ('date_timestamp',) | ('col_b',) | ('col_c',) | (vehicle_id',) |
+-----------------+-------------+----------------------+---------------+---------
| 0.180008333 | 2017-09-28T20:36:00Z | -6.1487501 | 38.35 | 1004 |
| 0.809708333 | 2017-06-17T14:16:00Z | 8.189424 | -6.8732784 | NominalValue |
+-----------------+-------------+----------------------+---------------+---------

下面是代码 -

from __future__ import print_function
import boto3
import json
import logging
import pymysql
from sqlalchemy import create_engine
from pandas.io import sql
from pandas.io.json import json_normalize
from datetime import datetime
print('Loading function')

s3 = boto3.client('s3')
def getEngine(endpoint):
engine_ = None
try:
engine_ = create_engine(endpoint)
except Exception as e:
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
raise e
return engine_
engine = getEngine('mysql+pymysql://username:password@endpoint/database')

configuration = {
"aTable":
{
"from" : ['col_1','col_2','date_timestamp','operator_id'],
"to" : ['date_timestamp','operator_id','col_1','col_2'],
"sql_table_name" : 'sql_table_a'
},
"bTable" : {
"from" : ['col_a','date_timestamp','col_b','col_c','vehicle_id'],
"to" : ['date_timestamp','col_a','col_b','vehicle_id','col_c'],
"sql_table_name" : 'sql_table_b'
}
}

def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
s3_object_key = event['Records'][0]['s3']['object']['key']
obj = s3.get_object(Bucket=bucket, Key=s3_object_key)
data = json.loads(obj['Body'].read())
for _key in data:
if not _key in configuration:
print("No configuration found for {0}".format(_key))
df = json_normalize(data[str(_key)])
df.columns=[configuration[_key]['from']]
#df = df.reindex(indexlist,axis="columns")
#df['date_timestamp'] = df['date_timestamp'].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%SZ"))
df.to_sql(name=configuration[_key]['sql_table_name'], con=engine, if_exists='append', index=False)
print(df)
return "Loaded data in RDS"

最佳答案

我们应该从 - 行代码中删除 []

    df.columns=[configuration[_key]['from']]

正确的代码是

    df.columns=configuration[_key]['from']

关于python - 在aws中获取表的列名作为元组而不是字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47682800/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com