gpt4 book ai didi

python - 处理 NaN 进行距离计算时出现问题?

转载 作者:太空宇宙 更新时间:2023-11-03 21:44:33 27 4
gpt4 key购买 nike

我有一个 DataFrame 如下(为简单起见),其中点作为索引列:

 import numpy as np
import pandas as pd
a = {'a' : [0.6,0.7,0.4,np.NaN,0.5,0.4,0.5,np.NaN],'b':['cat','bat','cat','cat','bat',np.NaN,'bat',np.nan]}
df = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])
df

由于它有 NaN,我希望将该列视为数字并执行以下操作:

for col in df.select_dtypes(include=['object']):
s = pd.to_numeric(df[col], errors='coerce')
if s.notnull().any():
df[col] = s

将列转换为数字类型后,我想计算距离矩阵如下:

def distmetric(x,y):
numeric5=x.select_dtypes(include=["number"])
others5=x.select_dtypes(exclude=["number"])
numeric6=y.select_dtypes(include=["number"])
others6=y.select_dtypes(exclude=["number"])
numnp5=numeric5.values
catnp5=others5.values
numnp6=numeric6.values
catnp6=others6.values
result3=np.around((np.repeat(numnp5, len(numnp6),axis=0) - np.tile(numnp6,(len(numnp5),1)))**2,3)
catres3=~(np.equal((np.repeat(catnp5,len(catnp6),axis=0)),(np.tile(catnp6,(len(catnp5),1)))))
sumtogeth3=result3.sum(axis=1)
sumcattoget3=catres3.sum(axis=1)
sum_result3=sumtogeth3+sumcattoget3
final_result3=np.around(np.sqrt(sum_result3),3)
final_result20=np.reshape(final_result3, (len(x.index),len(y.index)))
return final_result20

metric=distmetric(df,df)
print(metric)

我得到的距离矩阵如下:

 [[0.    1.005 0.2     nan 1.005 1.02  1.005   nan]
[1.005 0. 1.044 nan 0.2 1.044 0.2 nan]
[0.2 1.044 0. nan 1.005 1. 1.005 nan]
[ nan nan nan nan nan nan nan nan]
[1.005 0.2 1.005 nan 0. 1.005 0. nan]
[1.02 1.044 1. nan 1.005 1. 1.005 nan]
[1.005 0.2 1.005 nan 0. 1.005 0. nan]
[ nan nan nan nan nan nan nan nan]]

我想得到如下输出:

            x1       x2       x3      x4      x5       x6       x7       x8
x1 0.0 1.005 0.2 1.0 1.005 1.02 1.005 1.414
x2 1.005 0.0 1.044 1.414 0.2 1.044 0.2 1.414
x3 0.2 1.044 0.0 1.0 1.005 1.0 1.005 1.414
x4 1.0 1.414 1.0 0.0 1.414 1.414 1.414 1.0
x5 1.005 0.2 1.005 1.414 0.0 1.005 0.0 1.414
x6 1.02 1.044 1.0 1.414 1.005 0.0 1.005 1.0
x7 1.005 0.2 1.005 1.414 0.1 1.005 0.0 1.414
x8 1.414 1.414 1.414 1.0 1.414 1.0 1.414 0.0

我想计算两个 NaN 之间的距离,结果应为 0,而 NaN 与任何数字或任何字符串之间的距离应结果为 1。有什么方法或途径吗?这样做吗?

编辑:我用以下形式计算距离:

for each row:
if col is numerical:
then calculate (x1 element)-(x2 element)**2 and return this value to squareresult
if col is categorical:
then compare x1 element and x2 element.
if they are equal then cateresult=0
else cateresult=1
totaldistanceresultforrow=sqrt(squareresult+cateresult)

注意:NaN-NaN=0 和 NaN-any Num 或 string=1(这里“-”是减法)

最佳答案

这对我有帮助:

square_res = (df['a'].values - df['a'][:, None]) ** 2
numeric=pd.DataFrame(square_res)
idx = numeric.isnull().all()
alltrueindices=np.where(idx)

for index in alltrueindices:
numeric.loc[index, index] = 0
numeric = numeric.fillna(1)
df['b']=df['b'].replace(np.nan, '?')
cat_res = (df['b'].values != df['b'][:, None])
res = (numeric + cat_res) ** .5

print(res.round(3))

关于python - 处理 NaN 进行距离计算时出现问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52587239/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com