gpt4 book ai didi

python - 摄取 Null Int 列 : Pandas and Pandera

转载 作者:行者123 更新时间:2023-12-02 01:42:40 26 4
gpt4 key购买 nike

我将 pandas 与 pandera 一起用于模式验证,但我遇到了一个问题,因为数据中有一个空整数列。

from prefect import task, Flow          #type:ignore
from pandera import Check, Column, DataFrameSchema
import prefect
import pandas as pd
import pandera as pa
import numpy as np


def pschema(d):
logger = prefect.utilities.logging.get_logger() # type: ignore
engine = connect_db(prefect.config.kv.p.staging_db_constring, logger) #type:ignore


table_name = "MyTable"
org = "myOrg"

k = {}
df = pd.read_sql(
f"SELECT NameStrNotQuoted, FieldTypeName, SizeStr, Precision, Scale FROM dbo.vw_cx_meta WHERE [Table] = '{table_name}' and Organization='{org}' AND ETL_Active = 1",
engine,
)
for row in df.itertuples(index=False):
if row.FieldTypeName == "int":
k.update({row.NameStrNotQuoted:Column(int,Check(lambda x: pd.Series([x.fillna(0)],dtype='Int64')),coerce=True, nullable=True)})
elif row.FieldTypeName == 'bit':
k.update({row.NameStrNotQuoted:Column(pa.Bool, coerce=True)})
sch = DataFrameSchema(k)

sch.validate(d)
return k

错误:

ValueError: cannot convert float NaN to integer
.
.
.
File "/usr/local/lib/python3.8/site-packages/pandera/schemas.py", line 1789, in coerce_dtype
raise errors.SchemaError(
pandera.errors.SchemaError: Error while coercing 'CopySourceID' to type int64: Could not coerce <class 'pandas.core.series.Series'> data_container into type int64:

我知道 pandas 的“陷阱”在 int 列中有空值,我已经尝试了使用 Check 的 lambda 函数的每一种排列来解决这个问题。任何帮助将不胜感激,谢谢。

最佳答案

我刚遇到这个,它可以通过 Pandas 可为空整数(“Int64”/https://pandas.pydata.org/docs/user_guide/integer_na.html)和 coerce=True 的组合来解决。

例如:

pa.Column("Int64",coerce=True)

关于python - 摄取 Null Int 列 : Pandas and Pandera,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71395580/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com