- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在练习贷款预测练习问题,并尝试填充数据中的缺失值。我从here获取数据。为了完成这个问题,我遵循这个tutorial .
您可以找到我正在使用的完整代码(文件名 model.py)和数据 here在 GitHub 上。
数据框看起来像这样:
df[['Loan_ID', 'Self_Employed', 'Education', 'LoanAmount']].head(10)
Out:
Loan_ID Self_Employed Education LoanAmount
0 LP001002 No Graduate NaN
1 LP001003 No Graduate 128.0
2 LP001005 Yes Graduate 66.0
3 LP001006 No Not Graduate 120.0
4 LP001008 No Graduate 141.0
5 LP001011 Yes Graduate 267.0
6 LP001013 No Not Graduate 95.0
7 LP001014 No Graduate 158.0
8 LP001018 No Graduate 168.0
9 LP001020 No Graduate 349.0
最后一行执行后(对应model.py文件中的第60行)
url = 'https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'
df = pd.read_csv(url)
df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
df['Self_Employed'].fillna('No',inplace=True)
table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
# Define function to return value of this pivot_table
def fage(x):
return table.loc[x['Self_Employed'],x['Education']]
# Replace missing values
df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)
我收到此错误:
ValueError Traceback (most recent call last)
<ipython-input-40-5146e49c2460> in <module>()
----> 1 df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)
/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in fillna(self, value, method, axis, inplace, limit, downcast, **kwargs)
2368 axis=axis, inplace=inplace,
2369 limit=limit, downcast=downcast,
-> 2370 **kwargs)
2371
2372 @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in fillna(self, value, method, axis, inplace, limit, downcast)
3264 else:
3265 raise ValueError("invalid fill value with a %s" %
-> 3266 type(value))
3267
3268 new_data = self._data.fillna(value=value, limit=limit,
ValueError: invalid fill value with a <class 'pandas.core.frame.DataFrame'>
如何填充缺失值而不出现此错误?
最佳答案
教程的作者似乎想用 table
的值替换 NaN
。
但需要先通过 unstack
创建系列
和 set_index
用于对齐数据。
首先删除用 mean
替换为 NaN
:
url='https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'
df = pd.read_csv(url) #Reading the dataset in a dataframe using Pandas
#df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
df['Self_Employed'].fillna('No',inplace=True)
<小时/>
table = df.pivot_table(values='LoanAmount',
index='Self_Employed',
columns='Education',
aggfunc=np.median)
print (table.unstack())
Education Self_Employed
Graduate No 130.0
Yes 157.5
Not Graduate No 113.0
Yes 130.0
dtype: float64
<小时/>
#check all values with NaN in LoanAmount column
print (df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate NaN
35 No Graduate NaN
63 No Graduate NaN
81 Yes Graduate NaN
95 No Graduate NaN
102 No Graduate NaN
103 No Graduate NaN
113 Yes Graduate NaN
127 No Graduate NaN
202 No Not Graduate NaN
284 No Graduate NaN
305 No Not Graduate NaN
322 No Not Graduate NaN
338 No Not Graduate NaN
387 No Not Graduate NaN
435 No Graduate NaN
437 No Graduate NaN
479 No Graduate NaN
524 No Graduate NaN
550 Yes Graduate NaN
551 No Not Graduate NaN
605 No Not Graduate NaN
<小时/>
#for check get all indexes where NaNs
idx = df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']].index
print (idx)
Int64Index([ 0, 35, 63, 81, 95, 102, 103, 113, 127, 202, 284, 305, 322,
338, 387, 435, 437, 479, 524, 550, 551, 605],
# Replace missing values
df = df.set_index(['Education','Self_Employed'])
df['LoanAmount'].fillna(table.unstack(), inplace=True)
df = df.reset_index()
<小时/>
#check output - filter only indexes where NaNs before
print (df.loc[df.index.isin(idx), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate 130.0
35 No Graduate 130.0
63 No Graduate 130.0
81 Yes Graduate 157.5
95 No Graduate 130.0
102 No Graduate 130.0
103 No Graduate 130.0
113 Yes Graduate 157.5
127 No Graduate 130.0
202 No Not Graduate 113.0
284 No Graduate 130.0
305 No Not Graduate 113.0
322 No Not Graduate 113.0
338 No Not Graduate 113.0
387 No Not Graduate 113.0
435 No Graduate 130.0
437 No Graduate 130.0
479 No Graduate 130.0
524 No Graduate 130.0
550 Yes Graduate 157.5
551 No Not Graduate 113.0
605 No Not Graduate 113.0
编辑:
更好的解决方案是 groupby
与 apply
其中将 NaN
替换为 median
:
url='https://raw.githubusercontent.com/Aniruddh-SK/Loan-Prediction-Problem/master/train.csv'
df = pd.read_csv(url) #Reading the dataset in a dataframe using Pandas
#df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)
df['Self_Employed'].fillna('No',inplace=True)
print (df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate NaN
35 No Graduate NaN
63 No Graduate NaN
81 Yes Graduate NaN
95 No Graduate NaN
102 No Graduate NaN
103 No Graduate NaN
113 Yes Graduate NaN
127 No Graduate NaN
202 No Not Graduate NaN
284 No Graduate NaN
305 No Not Graduate NaN
322 No Not Graduate NaN
338 No Not Graduate NaN
387 No Not Graduate NaN
435 No Graduate NaN
437 No Graduate NaN
479 No Graduate NaN
524 No Graduate NaN
550 Yes Graduate NaN
551 No Not Graduate NaN
605 No Not Graduate NaN
<小时/>
idx = df.loc[df['LoanAmount'].isnull(), ['Self_Employed','Education', 'LoanAmount']].index
print (idx)
Int64Index([ 0, 35, 63, 81, 95, 102, 103, 113, 127, 202, 284, 305, 322,
338, 387, 435, 437, 479, 524, 550, 551, 605],
dtype='int64')
# Replace missing values
df['LoanAmount'] = df.groupby(['Education','Self_Employed'])['LoanAmount']
.apply(lambda x: x.fillna(x.median()))
<小时/>
print (df.loc[df.index.isin(idx), ['Self_Employed','Education', 'LoanAmount']])
Self_Employed Education LoanAmount
0 No Graduate 130.0
35 No Graduate 130.0
63 No Graduate 130.0
81 Yes Graduate 157.5
95 No Graduate 130.0
102 No Graduate 130.0
103 No Graduate 130.0
113 Yes Graduate 157.5
127 No Graduate 130.0
202 No Not Graduate 113.0
284 No Graduate 130.0
305 No Not Graduate 113.0
322 No Not Graduate 113.0
338 No Not Graduate 113.0
387 No Not Graduate 113.0
435 No Graduate 130.0
437 No Graduate 130.0
479 No Graduate 130.0
524 No Graduate 130.0
550 Yes Graduate 157.5
551 No Not Graduate 113.0
605 No Not Graduate 113.0
编辑:
还有一个问题:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
解决方案是替换 NaN
s:
df['Loan_Status'].fillna('No',inplace=True)
df['Credit_History'].fillna(0,inplace=True)
outcome_var = 'Loan_Status'
model = LogisticRegression()
predictor_var = ['Credit_History']
classification_model(model, df, predictor_var,outcome_var)
关于python - 值错误: invalid fill value with a <class 'pandas.core.frame.DataFrame' >,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44450725/
我正在尝试向 iOS 应用商店提交更新。我要从 Buzztouch 应用程序转到 Sprite Kit 应用程序。我能够存档 Xcode 项目并提交它。该应用程序的状态为“上传已收到”,但大约一分钟后
我收到了这个奇怪的警告。我不确定是什么原因造成的。 .dia文件扩展名应该表示核心有向图图形文件。我没有添加,应用程序几乎没有用户界面。 最佳答案 我对这个答案并不满意,但我认为它可以帮助人们,直到找
下面用作 Uri 参数的程序集限定字符串在 XAML 中工作,但在代码中使用时会出现错误。 我尝试了各种 UriKind,结果都相同。我该如何解决这个问题? [Test] public void La
我正在开发一个 Angular 应用程序,目的是将其部署到移动设备和 Web 浏览器上。设置表单样式以显示无效输入时,我应该定位 Angular“ng-invalid”类还是 HTML5“:inval
我有一个在 Google App Engine 上运行的应用程序,它是 Android 应用程序的后端。它基本上是 Android 应用程序和在我自己的服务器上运行的 MySQL 数据库之间的桥梁。
我的代码是这样的: func tableView(_ tableView: UITableView, commit editingStyle: UITableViewCellEditingStyle,
I need to encrypt using Python with the A256GCM algorithm, and getting back a JWT that I need to
无法成功编译webpack并生成bundle.js文件。据我了解,我的 src_dir 和 dist_dir 变量能够指向正确的路径,但在尝试编译时我仍然始终收到两个错误之一。 配置对象无效。 Web
因此,当我在 postgres 上运行 regexp_matches 时收到一条错误消息,并且无法弄清楚如何通过它。它似乎在 regex101 等 reg_exp 测试站点上运行良好,但不幸的是在实际
这些是我正在使用的导入: import com.novell.ldap.*; import java.io.UnsupportedEncodingException; 我正在尝试进行一个非常简单的密码
在记录器函数的简写情况下,Pylint 提示 Invalid constant name "myprint"(invalid-name)。 # import from utils import get
我试图创建一个HTML输入标签,该标签仅接受以2种格式之一输入的数字,并拒绝所有其他输入。 我只想接受以下格式的数字,包括破折号: 1234-12 和 1234-12-12 注意:不是日期,而是合法的
我一直在尝试使用 Bootstrap 的表单样式处理 AngularJS 的电子邮件验证,并遇到了这个 CSS block 。 input:focus:required:invalid, textar
我正在编写一个程序,以确保我了解如何在 C 中正确实现单向链表。我目前正在哈佛的 CS50 类(class)中学习,并且使用本教程,因为 CS50 人员不解释链接详细列出数据结构:https://ww
此问题与询问同一消息的另一个问题不重复,但在另一个上下文中。这个问题的上下文只是关于上传截图图像和获取消息。 今天,我在将图片上传到 App Store Connect 时收到一条新消息: Inval
我的代码似乎运行良好,但当我滑动以删除 UITableView 中的一行时,应用程序崩溃并显示以下内容: 错误 LittleToDoApp[70390:4116002] *** Terminating
当我尝试发送语音消息时,总是收到无效的url错误。我正在使用Whisper将音频转换为文本,但由于某种原因,我似乎无法将文件传递给Whisper。当我在Java脚本中使用它而不是在TypeScrip中
我正在尝试在 flutter 上对 http 客户端进行单元测试。在模拟 http 和我的存储库类之后: void main() { MockHttpCLient mockHttpCLient;
我正在使用 pandoc 作为一个库,相关的代码片段是: module Lib ( latexDirToTex, latexToTxt ) where import qualified
我正在开发一个(相对简单的)Rails应用程序。我正在使用Devise gem处理用户 session 。每当我导航到localhost:3000/users/sign_in时,我都会看到Devise
我是一名优秀的程序员,十分优秀!