gpt4 book ai didi

python - 确保特定列在数据框中最后(或第一个)的最快方法是什么

转载 作者:行者123 更新时间:2023-11-28 20:03:58 24 4
gpt4 key购买 nike

给定df

df = pd.DataFrame(np.arange(8).reshape(2, 4), columns=list('abcd'))

enter image description here

假设我需要将 'b' 列放在末尾。我可以这样做:

df[['a', 'c', 'd', 'b']]

enter image description here

但确保给定列位于末尾的最有效方法是什么?

这就是我一直在做的事情。其他人会怎么做?

def put_me_last(df, column):
return pd.concat([df.drop(column, axis=1), df[column]], axis=1)

put_me_last(df, 'b')

enter image description here


计时结果

结论mfripp 是赢家。似乎 reindex_axis[] 效率更高。这是非常好的信息。

enter image description here

代码

from string import lowercase

df_small = pd.DataFrame(np.arange(8).reshape(2, 4), columns=list('abcd'))
df_large = pd.DataFrame(np.arange(1000000).reshape(10000, 100),
columns=pd.MultiIndex.from_product([list(lowercase[:-1]), ['One', 'Two', 'Three', 'Four']]))


def pir1(df, column):
return pd.concat([df.drop(column, axis=1), df[column]], axis=1)

def pir2(df, column):
if df.columns[-1] == column:
return df
else:
pos = df.columns.values.__eq__('b').argmax()
return df[np.roll(df.columns, len(df.columns) - 1 - pos)]

def pir3(df, column):
if df.columns[-1] == column:
return df
else:
pos = df.columns.values.__eq__('b').argmax()
cols = df.columns.values
np.concatenate([cols[:pos], cols[1+pos:], cols[[pos]]])
return df[np.concatenate([cols[:pos], cols[1+pos:], cols[[pos]]])]

def pir4(df, column):
if df.columns[-1] == column:
return df
else:
return df[np.roll(df.columns.drop(column).insert(0, column), -1)]

def carsten1(df, column):
cols = list(df)
if cols[-1] == column:
return df
else:
return pd.concat([df.drop(column, axis=1), df[column]], axis=1)

def carsten2(df, column):
cols = list(df)
if cols[-1] == column:
return df
else:
idx = cols.index(column)
new_cols = cols[:idx] + cols[idx + 1:] + [column]
return df[new_cols]

def mfripp1(df, column):
new_cols = [c for c in df.columns if c != column] + [column]
return df[new_cols]

def mfripp2(df, column):
new_cols = [c for c in df.columns if c != column] + [column]
return df.reindex_axis(new_cols, axis='columns', copy=False)

def ptrj1(df, column):
return df.reindex(columns=df.columns.drop(column).append(pd.Index([column])))

def shivsn1(df, column):
column_list=list(df)
column_list.remove(column)
column_list.append(column)
return df[column_list]

def merlin1(df, column):
return df[df.columns.drop(["b"]).insert(99999, 'b')]


list_of_funcs = [pir1, pir2, pir3, pir4, carsten1, carsten2, mfripp1, mfripp2, ptrj1, shivsn1]

def test_pml(df, pml):
for c in df.columns:
pml(df, c)

summary = pd.DataFrame([], [f.__name__ for f in list_of_funcs], ['Small', 'Large'])

for f in list_of_funcs:
summary.at[f.__name__, 'Small'] = timeit(lambda: test_pml(df_small, f), number=100)
summary.at[f.__name__, 'Large'] = timeit(lambda: test_pml(df_large, f), number=10)

最佳答案

我会重新排列列的列表,而不是删除和附加其中一列:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(8).reshape(2, 4), columns=list('abcd'))

def put_me_last(df, column):
return pd.concat([df.drop(column, axis=1), df[column]], axis=1)

def put_me_last_fast(df, column):
new_cols = [c for c in df.columns if c != column] + [column]
return df[new_cols]

def put_me_last_faster(df, column):
new_cols = [c for c in df.columns if c != column] + [column]
return df.reindex_axis(new_cols, axis='columns', copy=False)

计时(在 iPython 中):

%timeit put_me_last(df, 'b')
# 1000 loops, best of 3: 741 µs per loop

%timeit put_me_last_fast(df, 'b')
# 1000 loops, best of 3: 295 µs per loop

%timeit put_me_last_faster(df, 'b')
# 1000 loops, best of 3: 239 µs per loop

%timeit put_me_last_faster(df, 'd') # not changing order
# 1000 loops, best of 3: 125 µs per loop

注意:您可以使用下面的行来定义 new_cols,但它比上面使用的行慢大约 80 倍(2 微秒对 160 微秒)

new_cols = df.columns.drop(column).insert(-1, column)

另请注意:如果您经常尝试将一列移动到已经存在的末尾,您可以通过添加它来将这些情况的时间缩短到 1 微秒以下,如 @Carsten 所述:

if df.columns[-1] == column:
return df

关于python - 确保特定列在数据框中最后(或第一个)的最快方法是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38601841/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com