gpt4 book ai didi

python - 如何定位给定值的索引?

转载 作者:行者123 更新时间:2023-12-01 09:22:57 24 4
gpt4 key购买 nike

我想在从数据帧中删除一行时求解相关系数。然后在获得所有相关系数后,我需要删除导致相关系数增加最高的行。

下面的代码显示了我的解决方案:

import pandas as pd
import numpy as np

#Access the data

file='tc_yolanda2.csv'
df = pd.read_csv(file)

x = df['dist']
y = df['mps']

#compute the correlation coefficient

def correlation_coefficient_4u(a,b):
correl_mat = np.corrcoef(a,b)
correlation = correl_mat[0,1]
return correlation

c = correlation_coefficient_4u(x,y)
print('Correlation coeffcient is:',c)

#Let us try this one

lenght = len(df)
print(lenght)
a = 0
while lenght != 0:
df.drop([a], inplace=True)
c = correlation_coefficient_4u(df.dist,df.mps)
a += 1
print(round(c,4))

它已成功生成 50 个相关系数,但也生成了许多错误,例如

RuntimeWarning: Degrees of freedom <= 0 for slice

RuntimeWarning: divide by zero encountered in double_scalars

RuntimeWarning: invalid value encountered in multiply

RuntimeWarning: Mean of empty slice.

RuntimeWarning: invalid value encountered in true_divide

ValueError: labels [50] not contained in axis

我的下一个问题是如何消除错误以及如何找到具有最高负值的相关系数的索引,以便我可以永久删除该行并重复上述过程。

顺便说一下,这是我的数据。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 2 columns):
dist 50 non-null float64
mps 50 non-null int64
dtypes: float64(1), int64(1)
memory usage: 880.0 bytes
None

结果:

dist  mps
0 441.6 2
1 385.4 7
2 470.7 1
3 402.2 0
4 361.6 0
5 458.6 3
6 453.9 6
7 425.2 4
8 336.6 8
9 265.4 5
10 207.0 5
11 140.5 28
12 229.9 4
13 175.2 6
14 244.5 2
15 455.7 4
16 396.4 12
17 261.8 7
18 291.5 9
19 233.9 2
20 167.8 9
21 88.9 15
22 110.1 25
23 97.1 15
24 160.4 10
25 344.0 0
26 381.6 21
27 391.9 3
28 314.7 2
29 320.7 14
30 252.9 10
31 323.1 12
32 256.0 6
33 281.6 5
34 280.4 5
35 339.8 10
36 301.9 12
37 381.8 0
38 320.2 10
39 347.6 8
40 301.0 4
41 369.7 6
42 378.4 4
43 446.8 4
44 397.4 3
45 454.2 2
46 475.1 0
47 427.0 8
48 463.4 8
49 464.6 2
Correlation coeffcient is: -0.529328951782
49
-0.5209
-0.5227
-0.5091
-0.4998
-0.4975
-0.4879
-0.4903
-0.4838
-0.4845
-0.4908
-0.5085
-0.4541
-0.4736
-0.4962
-0.5273
-0.5189
-0.5452
-0.5494
-0.5485
-0.5882
-0.5999
-0.5711
-0.4321
-0.3251
-0.296
-0.3214
-0.4595
-0.4516
-0.5018
-0.5
-0.4524
-0.431
-0.4514
-0.4955
-0.5603
-0.5263
-0.385
-0.4764
-0.3229
-0.194
-0.3029
-0.1961
-0.2572
-0.2572
-0.6454
-0.7041
-0.5241
-1.0

Warning (from warnings module):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3159
c = cov(x, y, rowvar)
RuntimeWarning: Degrees of freedom <= 0 for slice

Warning (from warnings module):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3093
c *= 1. / np.float64(fact)
RuntimeWarning: divide by zero encountered in double_scalars

Warning (from warnings module):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 3093
c *= 1. / np.float64(fact)
RuntimeWarning: invalid value encountered in multiply
nan

Warning (from warnings module):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\function_base.py", line 1110
avg = a.mean(axis)
RuntimeWarning: Mean of empty slice.

Warning (from warnings module):
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\_methods.py", line 73
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
nan
Traceback (most recent call last):
File "C:/Users/User/Desktop/CARDS 2017 Research Study/Python/methodology.py", line 28, in <module>
df.drop([a], inplace=True)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2530, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py", line 2562, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\base.py", line 3744, in drop
labels[mask])
ValueError: labels [50] not contained in axis

最佳答案

您可以使用以下代码查找并删除导致相关系数增幅最高的行。

length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0

for i,v in df_temp.iterrows():
df_temp.drop([i], inplace=True)
c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
if c > c_temp:
idxmax=i
c=c_temp
df_temp=df.copy()
#print(round(c_temp,4))

df.drop([idxmax], inplace=True)
return df

for i in range(0, length-1):
cc=correlation_coefficient_4u(df.dist,df.mps)
if cc < -0.9:
break
else:
df=dropcc(df)

关于python - 如何定位给定值的索引?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50674363/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com