gpt4 book ai didi

python - 跨多列的 Pandas 代表性抽样

转载 作者:行者123 更新时间:2023-12-03 23:42:25 24 4
gpt4 key购买 nike

我有一个代表人口的数据框,每列表示该人的不同质量/特征。我如何获得该数据框/总体的样本,该样本代表了所有特征中的总体总体。
假设我有一个数据框,它代表 650 人的劳动力,如下所示:

import pandas as pd
import numpy as np
c = np.random.choice

colours = ['blue', 'yellow', 'green', 'green... no, blue']
knights = ['Bedevere', 'Galahad', 'Arthur', 'Robin', 'Lancelot']
qualities = ['wise', 'brave', 'pure', 'not quite so brave']

df = pd.DataFrame({'name_id':c(range(3000), 650, replace=False),
'favourite_colour':c(colours, 650),
'favourite_knight':c(knights, 650),
'favourite_quality':c(qualities, 650)})
我可以得到上面反射(reflect)单列分布的样本,如下所示:
# Find the distribution of a particular column using value_counts and normalize:
knight_weight = df['favourite_knight'].value_counts(normalize=True)

# Add this to my dataframe as a weights column:
df['knight_weight'] = df['favourite_knight'].apply(lambda x: knight_weight[x])

# Then sample my dataframe using the weights column I just added as the 'weights' argument:
df_sample = df.sample(140, weights=df['knight_weight'])
这将返回一个样本数据帧(df_sample),使得:
df_sample['favourite_knight'].value_counts(normalize=True)
is approximately equal to
df['favourite_knight'].value_counts(normalize=True)
我的问题是这样的:
如何生成示例数据帧(df_sample),以便上述即:
df_sample[column].value_counts(normalize=True)
is approximately equal to
df[column].value_counts(normalize=True)
对于所有列('name_id' 除外)是否为真,而不仅仅是其中之一?样本量为 140 的 650 人口大约是我正在使用的规模,因此性能不是太大问题。我很乐意接受需要几分钟才能运行的解决方案,因为这仍然比手动生成上述示例要快得多。感谢您的任何帮助。

最佳答案

您创建一个组合特征列,加权该列并将其绘制为权重:

df["combined"] = list(zip(df["favourite_colour"],
df["favourite_knight"],
df["favourite_quality"]))

combined_weight = df['combined'].value_counts(normalize=True)

df['combined_weight'] = df['combined'].apply(lambda x: combined_weight[x])

df_sample = df.sample(140, weights=df['combined_weight'])

关于python - 跨多列的 Pandas 代表性抽样,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64967847/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com