gpt4 book ai didi

python - pandas 结果变量为 NaN

转载 作者:太空宇宙 更新时间:2023-11-03 13:58:06 25 4
gpt4 key购买 nike

我已将结果变量 y 设置为 csv 中的列。当我只打印 y 时,它可以正确加载并工作,但是当我使用 y = y[x:] 时,我开始获取 NaN 作为值。

y = previous_games_stats['Unnamed: 7'] #outcome variable (win/loss)
y = y[9:] #causes NaN for outcome variables

然后在文件中我打印结果列。 final_df 是一个数据帧,尚未设置结果变量,因此我将其设置如下:

final_df['outcome'] = y
print(final_df['outcome'])

但结果是:

0    NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 L

看起来最后一个值是正确的(它们都应该是“W”或“L”)。

如何正确排列数据框以免得到 NaN?

整个代码:

from sklearn.datasets import load_iris

from sklearn.ensemble import RandomForestClassifier

import pandas as pd

import numpy as np

import time

import matplotlib.pyplot as plt

np.random.seed(0)

from array import array

iris=load_iris()

previous_games_stats = pd.read_csv('stats/2016-2017 CANUCKS STATS.csv', header=1)
numGamesToLookBack = 10;
axis=1) #Predictor variables

X = previous_games_stats[['GF', 'GA']]
count = 0
final_df = pd.DataFrame(columns=['GF', 'GA'])

#final_y = pd.DataFrame(columns=['Unnamed: 7'])

y = previous_games_stats['Unnamed: 7'] #outcome variable (win/loss)
y = y[numGamesToLookBack-1:]



for game in range(0, 10):
X = previous_games_stats[['GF', 'GA']]
X = X[count:numGamesToLookBack] #num games to look back
stats_feature_names = list(X.columns.values)

df = pd.DataFrame(iris.data, columns=iris.feature_names)

stats_df = pd.DataFrame(X, columns=stats_feature_names).sum().to_frame().T
final_df = final_df.append(stats_df, ignore_index=True)

count+=1
numGamesToLookBack+=1



print("final_df:\n", final_df)



stats_target_names = np.array(['Win', 'Loss']) #don't need?...just a label it looks like

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

final_df['outcome'] = y
final_df['outcome'].update(y) #ADDED UPDATE TO FIX NaN



df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75 #for iris


final_df['is_train'] = np.random.uniform(0, 1, len(final_df)) <= .65


train, test = df[df['is_train']==True], df[df['is_train']==False]
stats_train = final_df[final_df['is_train']==True]
stats_test = final_df[final_df['is_train']==False]


features = df.columns[:4]
stats_features = final_df.columns[:2]


y = pd.factorize(train['species'])[0]
stats_y = pd.factorize(stats_train['outcome'])[0]

clf = RandomForestClassifier(n_jobs=2, random_state=0)
stats_clf = RandomForestClassifier(n_jobs=2, random_state=0)


clf.fit(train[features], y)
stats_clf.fit(stats_train[stats_features], stats_y)

stats_clf.predict_proba(stats_test[stats_features])[0:10]



preds = iris.target_names[clf.predict(test[features])]
stats_preds = stats_target_names[stats_clf.predict(stats_test[stats_features])]




pd.crosstab(stats_test['outcome'], stats_preds, rownames=['Actual Outcome'], colnames=['Predicted Outcome'])
print("~~~confusion matrix~~~\nColumns represent what we predicted for the outcome of the game, and rows represent the actual outcome of the game.\n")
print(pd.crosstab(stats_test['outcome'], stats_preds, rownames=['Actual Outcome'], colnames=['Predicted Outcome']))

最佳答案

这是预期的,因为 y第一个 9 没有索引(没有数据)值,因此在分配回来后得到 NaN s。

如果列是新列且长度为 ydf 的长度相同分配 numpy 数组:

final_df['outcome'] = y.values
<小时/>

但是如果长度不同,那就有点复杂了,因为需要相同的长度:

df = pd.DataFrame({'a':range(10), 'b':range(20,30)}).astype(str).radd('a')
print (df)
a b
0 a0 a20
1 a1 a21
2 a2 a22
3 a3 a23
4 a4 a24
5 a5 a25
6 a6 a26
7 a7 a27
8 a8 a28
9 a9 a29

y = df['a']
y = y[4:]
print (y)
4 a4
5 a5
6 a6
7 a7
8 a8
9 a9
Name: a, dtype: object
<小时/>

<强> len(final_df) < len(y) :

过滤器y通过final_df ,然后转换为 numpy 数组以不对齐索引:

final_df = pd.DataFrame({'new':range(100, 105)})
final_df['s'] = y.iloc[:len(final_df)].values
print (final_df)
new s
0 100 a4
1 101 a5
2 102 a6
3 103 a7
4 104 a8

<强> len(final_df) > len(y) :

创建新的Series通过过滤index值:

final_df1 = pd.DataFrame({'new':range(100, 110)})
final_df1['s'] = pd.Series(y.values, index=final_df1.index[:len(y)])
print (final_df1)
new s
0 100 a4
1 101 a5
2 102 a6
3 103 a7
4 104 a8
5 105 a9
6 106 NaN
7 107 NaN
8 108 NaN
9 109 NaN

关于python - pandas 结果变量为 NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49465678/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com