gpt4 book ai didi

python - 如何使用 pandas 中 apply() 函数的返回值设置类别?

转载 作者:太空宇宙 更新时间:2023-11-03 21:26:58 25 4
gpt4 key购买 nike

预先感谢您的阅读。

首先,我使用 python 3.7 以及 pandas 0.23.4 和 numpy 1.15。

如果我设置一个类别列,例如 df.at[(...), col] = 'category'它工作得很好。

如下例所示,如果我通过 apply() 函数设置类别,则该列将变为“object”数据类型。

如何使用 pandas 中 apply() 函数的返回值设置类别?

<pre>
import pandas as pd
import numpy as np

phones = [5551234,5551235,5551236,5551237,5551238,5551239,5551240,5551241,5551242,5551243,5551244,5551245,5551246]

dates = ['01/01/2018','01/07/2017','01/01/2017','01/07/2016','01/01/2016','01/07/2015','01/01/2015','01/07/2014', '01/01/2014','01/07/2013','01/01/2013','01/07/2012','01/01/2012']

df = pd.DataFrame({'PHONE': phones, 'DATE': dates})

df['DATE'] = pd.to_datetime(df['DATE'], format='%d/%m/%Y', errors='coerce')

age_cats = pd.Categorical([], categories=['hot', 'warm', 'cold', 'old', 'ignored'])

df['AGE'] = pd.Series(age_cats)

df.info()
class 'pandas.core.frame.DataFrame'
RangeIndex: 13 entries, 0 to 12
Data columns (total 3 columns):
PHONE 13 non-null int64
DATE 13 non-null datetime64[ns]
AGE 0 non-null category
dtypes: category(1), datetime64[ns](1), int64(1)
memory usage: 501.0 bytes


def get_age(_date):
if pd.isnull(_date):
return 'old'

today = pd.Timestamp.today()
d = today.day

if today.month == 2 and d == 29:
d = 28
y1 = pd.Timestamp(today.year -1, today.month, d)
y2 = pd.Timestamp(today.year -2, today.month, d)
y3 = pd.Timestamp(today.year -3, today.month, d)
y4 = pd.Timestamp(today.year -4, today.month, d)
y5 = pd.Timestamp(today.year -5, today.month, d)

if today &lt _date:
raise Exception('Future dates mean there is a bug.')
if y1 &lt _date and _date &lt= today:
return 'hot'
elif y3 &lt _date and _date &lt= y1:
return 'warm'
elif y5 &lt _date and _date &lt= y3:
return 'cold'
else:
return 'old'

df.at[:, 'AGE'] = df.DATE.apply(get_age)
df.info()

class 'pandas.core.frame.DataFrame'
RangeIndex: 13 entries, 0 to 12
Data columns (total 3 columns):
PHONE 13 non-null int64
DATE 13 non-null datetime64[ns]
AGE 13 non-null object
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 392.0+ bytes
</pre>

我添加了与第一个类别相同的第二个 AGE2 列。我在循环过程中使用了相同的函数,并且分类数据类型没有被覆盖。

我使用 apply() 函数是否错误?

df['AGE2'] = pd.Series(age_cats)

for i, r in df.iterrows():
df.loc[[i],'AGE2'] = get_age(r['DATE'])

df.info()

class 'pandas.core.frame.DataFrame'
RangeIndex: 13 entries, 0 to 12
Data columns (total 4 columns):
PHONE 13 non-null int64
DATE 13 non-null datetime64[ns]
AGE 13 non-null object
AGE2 13 non-null category
dtypes: category(1), datetime64[ns](1), int64(1), object(1)
memory usage: 605.0+ bytes

最佳答案

为什么不使用 astype 按以下方式执行此操作在 Series 对象上:

df['AGE'] = df.DATE.apply(get_age).astype('category', ordered=True, categories=['old', None])

关于python - 如何使用 pandas 中 apply() 函数的返回值设置类别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53784512/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com