gpt4 book ai didi

python - 值错误: invalid fill value with a

转载 作者:行者123 更新时间:2023-11-30 08:54:06 25 4
gpt4 key购买 nike

我正在练习贷款预测练习问题,并尝试填充数据中的缺失值。我从here获取数据。为了完成这个问题,我遵循这个tutorial .

您可以找到我正在使用的完整代码(文件名 model.py)和数据 here在 GitHub 上。

数据框看起来像这样:

df[['Loan_ID', 'Self_Employed', 'Education', 'LoanAmount']].head(10)
Out:
Loan_ID Self_Employed Education LoanAmount
0 LP001002 No Graduate NaN
1 LP001003 No Graduate 128.0
2 LP001005 Yes Graduate 66.0
3 LP001006 No Not Graduate 120.0
4 LP001008 No Graduate 141.0
5 LP001011 Yes Graduate 267.0
6 LP001013 No Not Graduate 95.0
7 LP001014 No Graduate 158.0
8 LP001018 No Graduate 168.0
9 LP001020 No Graduate 349.0

最后一行执行后(对应model.py文件中的第60行)

url = 'https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'
df = pd.read_csv(url)
df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
df['Self_Employed'].fillna('No',inplace=True)

table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
# Define function to return value of this pivot_table
def fage(x):
return table.loc[x['Self_Employed'],x['Education']]
# Replace missing values
df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)

我收到此错误:

ValueError                                Traceback (most recent call last)
<ipython-input-40-5146e49c2460> in <module>()
----> 1 df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)

/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
2368 axis=axis, inplace=inplace,
2369 limit=limit, downcast=downcast,
-> 2370 **kwargs)
2371
2372 @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in fillna(self, value, method, axis, inplace, limit, downcast)
3264 else:
3265 raise ValueError("invalid fill value with a %s" %
-> 3266 type(value))
3267
3268 new_data = self._data.fillna(value=value, limit=limit,

ValueError: invalid fill value with a <class 'pandas.core.frame.DataFrame'>

如何填充缺失值而不出现此错误?

最佳答案

教程的作者似乎想用 table 的值替换 NaN

但需要先通过 unstack 创建系列set_index用于对齐数据。

首先删除用 mean 替换为 NaN:

url='https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'

df = pd.read_csv(url) #Reading the dataset in a dataframe using Pandas

#df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)

df['Self_Employed'].fillna('No',inplace=True)
<小时/>
table = df.pivot_table(values='LoanAmount', 
index='Self_Employed',
columns='Education',
aggfunc=np.median)

print (table.unstack())
Education Self_Employed
Graduate No 130.0
Yes 157.5
Not Graduate No 113.0
Yes 130.0
dtype: float64
<小时/>
#check all values with NaN in LoanAmount column
print (df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate NaN
35 No Graduate NaN
63 No Graduate NaN
81 Yes Graduate NaN
95 No Graduate NaN
102 No Graduate NaN
103 No Graduate NaN
113 Yes Graduate NaN
127 No Graduate NaN
202 No Not Graduate NaN
284 No Graduate NaN
305 No Not Graduate NaN
322 No Not Graduate NaN
338 No Not Graduate NaN
387 No Not Graduate NaN
435 No Graduate NaN
437 No Graduate NaN
479 No Graduate NaN
524 No Graduate NaN
550 Yes Graduate NaN
551 No Not Graduate NaN
605 No Not Graduate NaN
<小时/>
#for check get all indexes where NaNs
idx = df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']].index
print (idx)
Int64Index([ 0, 35, 63, 81, 95, 102, 103, 113, 127, 202, 284, 305, 322,
338, 387, 435, 437, 479, 524, 550, 551, 605],

# Replace missing values
df = df.set_index(['Education','Self_Employed'])
df['LoanAmount'].fillna(table.unstack(), inplace=True)
df = df.reset_index()
<小时/>
#check output - filter only indexes where NaNs before
print (df.loc[df.index.isin(idx), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate 130.0
35 No Graduate 130.0
63 No Graduate 130.0
81 Yes Graduate 157.5
95 No Graduate 130.0
102 No Graduate 130.0
103 No Graduate 130.0
113 Yes Graduate 157.5
127 No Graduate 130.0
202 No Not Graduate 113.0
284 No Graduate 130.0
305 No Not Graduate 113.0
322 No Not Graduate 113.0
338 No Not Graduate 113.0
387 No Not Graduate 113.0
435 No Graduate 130.0
437 No Graduate 130.0
479 No Graduate 130.0
524 No Graduate 130.0
550 Yes Graduate 157.5
551 No Not Graduate 113.0
605 No Not Graduate 113.0

编辑:

更好的解决方案是 groupbyapply其中将 NaN 替换为 median:

url='https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'

df = pd.read_csv(url) #Reading the dataset in a dataframe using Pandas

#df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)

df['Self_Employed'].fillna('No',inplace=True)


print (df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate NaN
35 No Graduate NaN
63 No Graduate NaN
81 Yes Graduate NaN
95 No Graduate NaN
102 No Graduate NaN
103 No Graduate NaN
113 Yes Graduate NaN
127 No Graduate NaN
202 No Not Graduate NaN
284 No Graduate NaN
305 No Not Graduate NaN
322 No Not Graduate NaN
338 No Not Graduate NaN
387 No Not Graduate NaN
435 No Graduate NaN
437 No Graduate NaN
479 No Graduate NaN
524 No Graduate NaN
550 Yes Graduate NaN
551 No Not Graduate NaN
605 No Not Graduate NaN
<小时/>
idx = df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']].index
print (idx)
Int64Index([ 0, 35, 63, 81, 95, 102, 103, 113, 127, 202, 284, 305, 322,
338, 387, 435, 437, 479, 524, 550, 551, 605],
dtype='int64')

# Replace missing values
df['LoanAmount'] = df.groupby(['Education','Self_Employed'])['LoanAmount']
.apply(lambda x: x.fillna(x.median()))
<小时/>
print (df.loc[df.index.isin(idx), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate 130.0
35 No Graduate 130.0
63 No Graduate 130.0
81 Yes Graduate 157.5
95 No Graduate 130.0
102 No Graduate 130.0
103 No Graduate 130.0
113 Yes Graduate 157.5
127 No Graduate 130.0
202 No Not Graduate 113.0
284 No Graduate 130.0
305 No Not Graduate 113.0
322 No Not Graduate 113.0
338 No Not Graduate 113.0
387 No Not Graduate 113.0
435 No Graduate 130.0
437 No Graduate 130.0
479 No Graduate 130.0
524 No Graduate 130.0
550 Yes Graduate 157.5
551 No Not Graduate 113.0
605 No Not Graduate 113.0

编辑:

还有一个问题:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

解决方案是替换 NaNs:

df['Loan_Status'].fillna('No',inplace=True)
df['Credit_History'].fillna(0,inplace=True)

outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']

classification_model(model, df, predictor_var,outcome_var)

关于python - 值错误: invalid fill value with a <class 'pandas.core.frame.DataFrame' >,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44450725/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com