gpt4 book ai didi

python - parse_dates 如何与 pd.read_sql_query 一起使用

转载 作者:行者123 更新时间:2023-11-29 07:32:41 25 4
gpt4 key购买 nike

从 MySQL 数据库检索数据时,Pandas parse_date 应该如何工作?

Pandas 0.23 的文档提供此信息:

parse_dates : list or dict, default: None

List of column names to parse as dates.

Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.

例如,我想从 MySQL Sakila 数据库中检索一些数据。

create table actor
(
actor_id smallint(5) unsigned auto_increment
primary key,
first_name varchar(45) not null,
last_name varchar(45) not null,
last_update timestamp not null on update CURRENT_TIMESTAMP,
constraint idx_unique_id_name
unique (actor_id, last_name)
)

这是一些示例数据:

INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (1, 'PENELOPE', 'None', '2018-05-17 11:08:03');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (2, 'NICK', 'WAHLBERG', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (3, 'ED', 'CHASE', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (4, 'JENNIFER', 'DAVIS', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (5, 'JOHNNY', 'LOLLOBRIGIDA', '2018-05-17 11:14:15');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (6, 'BETTE', 'Echt', '2018-05-17 11:13:57');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (7, 'GRACE', 'MOSTEL', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (8, 'MATTHEW', 'JOHANSSON', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (9, 'JOE', 'SWANK', '2006-02-15 04:34:33');
INSERT INTO sakila.actor (actor_id, first_name, last_name, last_update) VALUES (10, 'CHRISTIAN', 'GABLE', '2006-02-15 04:34:33');

我使用默认的 MySQL Python 连接器:

    db_connection_url = 'mysql+mysqlconnector://' \
+ mysql_config_dict['user'] \
+ ":" \
+ mysql_config_dict['password'] \
+ "@" \
+ mysql_config_dict['host'] \
+ ":" \
+ mysql_config_dict['port'] \
+ "/" \
+ mysql_config_dict['db_name']

if('ssl_cert' in mysql_config_dict):

ssl_args = {'ssl_ca':mysql_config_dict['ssl_ca']}

else:
ssl_args = ''

有了这些参数

mysql_config_dict = {
'user': 'root',
'password': '',
'host': '127.0.0.1',
'port': '3306',
'db_name': 'sakila',
'ssl_cert': os.getenv('SSL_CERT'),
'ssl_key': os.getenv('SSL_KEY'),
'ssl_ca': os.getenv('SSL_CA')
}

用于获得引擎。

检索结果集的 Python 代码段:

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates={'last_update':'%Y%m%d %H:%M:%S'},con=mysql_conn)

我得到一个 KeyError:

Traceback (most recent call last):
File "~/Development/python-virtual-env/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2442, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'last_update'

当我使用

df = pd.read_sql_query('SELECT a.actor_id, a.last_name, a.last_update FROM sakila.actor a',parse_dates=True,con=mysql_conn)

它有效,但我可以在 IntelliJ 的 DataFrame View 中看到列“last_update”的列名称以字节文字为前缀:b'last_update',这很奇怪。

当我想将多列视为日期列时,这里的正确用法是什么?谢谢!

最佳答案

当我调用 pd.read_sql 时,我将列表中的字段名称传递给 parse_dates:

df= pd.read_sql(query, 
connection,
parse_dates=['Date_of_creation',
'Date_of_termination']
)

您提到使用字典来自定义格式:

fmt='%Y%m%d %H:%M:%S'

df= pd.read_sql(query,
connection,
parse_dates={'Date_of_creation':fmt,
'Date_of_termination':fmt}
)

关于python - parse_dates 如何与 pd.read_sql_query 一起使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50510698/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com