gpt4 book ai didi

python - 删除具有一定百分比的 0's pandas 的列和行

转载 作者:太空宇宙 更新时间:2023-11-04 04:16:17 25 4
gpt4 key购买 nike

我有二维数据(Column-Cell1、Cell2..、Row-Gene1、Gene2..),我想删除其中包含 99% 零的行,并在结果矩阵中删除包含 99% 零的列.我已经编写了以下代码来执行相同的操作,但是由于矩阵非常大,因此需要很长时间才能运行。有没有更好的方法来解决这个问题?

import pandas as pd
import numpy as np

def read_in(matrix_file):
matrix_df=pd.read_csv(matrix_file,index_col=0)
return(matrix_df)

def genes_less_exp(matrix_df):
num_columns=matrix_df.shape[1]
for index, row in matrix_df.iterrows():
zero_els=np.count_nonzero(row.values==0)
gene_per_zero=(float(zero_els)/float(num_columns))*100
if gene_per_zero >= 99:
matrix_df.drop([index],axis=0,inplace=True)
return(matrix_df)

def cells_less_exp(matrix_df):
num_rows=matrix_df.shape[0]
for label,content in matrix_df.iteritems():
zero_els=np.count_nonzero(content.values==0)
cells_per_zero=(float(zero_els)/float(num_rows))*100
if cells_per_zero >= 99:
matrix_df.drop(label,axis=1,inplace=True)
return(matrix_df)


if __name__ == "__main__":
matrix_df=read_in("Data/big-matrix.csv")
print("original:"+str(matrix_df.shape))
filtered_genes=genes_less_exp(matrix_df)
print("filtered_genes:"+str(filtered_genes.shape))
filtered_cells=cells_less_exp(filtered_genes)
print("filtered_cells:"+str(filtered_cells.shape))
filtered_cells.to_csv("abi.99.percent.filtered.csv", sep=',')

最佳答案

如果您将问题重新定义为“保留那些小于 99% 0 的问题”,这会更容易。

def drop_almost_zero(df, percentage):
row_cut_off = int(percentage/100*len(df.columns))
df = df[(df==0).sum(axis='columns') <= row_cut_off]

column_cut_off = int(percentage/100*len(df))
b = (df == 0).sum(axis='rows')
df = df[ b[ b <= column_cut_off].index.values ]

return df


#test
size = 50
percentage = 90

rows = size//2
columns = size

a = np.random.choice(2, size=(rows, columns), p=[(1-0.1), 0.1])
df = pd.DataFrame(a, columns=[f'c{i}' for i in range(size)])

df = drop_almost_zero(df,percentage)

assert (df == 0).sum(axis='rows').max() <= percentage/100*rows
assert (df == 0).sum(axis='columns').max() <= percentage/100*columns

关于python - 删除具有一定百分比的 0's pandas 的列和行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55352193/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com