gpt4 book ai didi

python - 将非空单元格移到分组列 pandas 的左侧

转载 作者:太空宇宙 更新时间:2023-11-03 13:12:42 25 4
gpt4 key购买 nike

我有一个数据框,其中有多个具有相似列名的列。我希望用右边有数据的那些列填充空单元格。

Address1     Address2     Address3     Address4     Phone1     Phone2     Phone3     Phone4
ABC nan def nan 9091-XYz nan nan XYZ-ABZ

应该列转移到类似的东西

Address1     Address2     Address3     Address4     Phone1     Phone2     Phone3     Phone4
ABC def nan nan 9091-XYz XYZ-ABZ nan nan

还有另一个 question 解决了类似的问题。

pdf = pd.read_csv('Data.txt',sep='\t')

# gets a set of columns removing the numerical part
columns = set(map(lambda x : x.rstrip('0123456789'),pdf.columns))

for col_pattern in columns:
# get columns with similar names
current = [col for col in pdf.columns if col_pattern in col]
coldf= pdf[current]
# shift columns to the left

文件 Data.txt 中的列按列名排序,因此所有具有相似名称的列都放在一起。

感谢任何帮助

我曾尝试将此添加到链接中的上述代码中,但内存不足:

    newdf=pd.read_csv(StringIO(u''+re.sub(',+',',',df.to_csv()).decode('utf-8')))
list_.append(newdf)
pd.concat(list_,axis=0).to_csv('test.txt')

最佳答案

MultiIndexdropna 的解决方案:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Address1': {0: 'ABC', 1: 'ABC'},
'Address2': {0: np.nan, 1: np.nan},
'Address3': {0: 'def', 1: 'def'},
'Phone4': {0: 'XYZ-ABZ', 1: 'XYZ-ABZ'},
'Address4': {0: np.nan, 1: np.nan},
'Phone1': {0: '9091-XYz', 1: 'Z9091-XYz'},
'Phone3': {0: np.nan, 1: 'aaa'},
'Phone2': {0: np.nan, 1: np.nan}})

print (df)
Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
0 ABC NaN def NaN 9091-XYz NaN NaN XYZ-ABZ
1 ABC NaN def NaN Z9091-XYz NaN aaa XYZ-ABZ
#multiindex from columns of df
cols = df.columns.str.extract('([[A-Za-z]+)(\d+)', expand=True).values.tolist()

mux = pd.MultiIndex.from_tuples(cols)
df.columns = mux
print (df)
Address Phone
1 2 3 4 1 2 3 4
0 ABC NaN def NaN 9091-XYz NaN NaN XYZ-ABZ
1 ABC NaN def NaN Z9091-XYz NaN aaa XYZ-ABZ

#unstack, remove NaN rows, convert to df (because cumcount)
df1 = df.unstack().dropna().reset_index(level=1, drop=True).to_frame()
#create new level of index
df1['g'] = (df1.groupby(level=[0,1]).cumcount() + 1).astype(str)
#add column g to multiindex
df1.set_index('g', append=True, inplace=True)
#reshape to original
df1 = df1.unstack(level=[0,2])
#remove first level of multiindex of column (0 from to_frame)
df1.columns = df1.columns.droplevel(0)
#reindex and replace None to NaN
df1 = df1.reindex(columns=mux).replace({None: np.nan})
#'reset' multiindex in columns
df1.columns = [''.join(col) for col in df1.columns]
print (df1)
Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
0 ABC def NaN NaN 9091-XYz XYZ-ABZ NaN NaN
1 ABC def NaN NaN Z9091-XYz aaa XYZ-ABZ NaN

旧的解决方案:

我发现另一个问题 - 如果 DataFrame 中有更多行,上面的解决方案不会正常工作。所以你可以使用双重apply。但是这个解决方案的问题是行中值的顺序不正确:

df = pd.DataFrame({'Address1': {0: 'ABC', 1: 'ABC'}, 'Address2': {0: np.nan, 1: np.nan}, 'Address3': {0: 'def', 1: 'def'}, 'Phone4': {0: 'XYZ-ABZ', 1: 'XYZ-ABZ'}, 'Address4': {0: np.nan, 1: np.nan}, 'Phone1': {0: '9091-XYz', 1: '9091-XYz'}, 'Phone3': {0: np.nan, 1: 'aaa'}, 'Phone2': {0: np.nan, 1: np.nan}})

print (df)
Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
0 ABC NaN def NaN 9091-XYz NaN NaN XYZ-ABZ
1 ABC NaN def NaN 9091-XYz NaN aaa XYZ-ABZ

cols = df.columns.str.extract('([[A-Za-z]+)(\d+)', expand=True).values.tolist()
mux = pd.MultiIndex.from_tuples(cols)
df.columns = mux

df = df.groupby(axis=1, level=0)
.apply(lambda x: x.apply(lambda y: y.sort_values().values, axis=1))

df.columns = [''.join(col) for col in df.columns]
print (df)
Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
0 ABC def NaN NaN 9091-XYz XYZ-ABZ NaN NaN
1 ABC def NaN NaN 9091-XYz XYZ-ABZ aaa NaN

我也尝试修改 piRSquared解决方案 - 那么你不需要 MultiIndex:

coltype = df.columns.str.extract(r'([[A-Za-z]+)', expand=False)
print (coltype)
Index(['Address', 'Address', 'Address', 'Address', 'Phone', 'Phone', 'Phone',
'Phone'],
dtype='object')

df = df.groupby(coltype, axis=1)
.apply(lambda x: x.apply(lambda y: y.sort_values().values, axis=1))
print (df)
Address1 Address2 Address3 Address4 Phone1 Phone2 Phone3 Phone4
0 ABC def NaN NaN 9091-XYz XYZ-ABZ NaN NaN
1 ABC def NaN NaN 9091-XYz XYZ-ABZ aaa NaN

关于python - 将非空单元格移到分组列 pandas 的左侧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39361839/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com