gpt4 book ai didi

python - 如果列与特定值匹配,如何在 Pandas Dataframe 中创建虚拟变量?

转载 作者:太空宇宙 更新时间:2023-11-04 02:28:10 25 4
gpt4 key购买 nike

我有一个 Pandas Dataframe,其列 (ip) 具有某些值,而另一个 Pandas Series 不在此 DataFrame 中,具有这些值的集合。如果给定行在我的 Pandas 系列 (black_ip) 中有其 ip,我想在 DataFrame 中创建一个列,该列为 1。

import pandas as pd

dict = {'ip': {0: 103022, 1: 114221, 2: 47902, 3: 23550, 4: 84644}, 'os': {0: 23, 1: 19, 2: 17, 3: 13, 4: 19}}

df = pd.DataFrame(dict)

df
ip os
0 103022 23
1 114221 19
2 47902 17
3 23550 13
4 84644 19

blacklist = pd.Series([103022, 23550])

blacklist

0 103022
1 23550

我的问题是:如何在 df 中创建一个新列,以便在黑名单中给定的 ip 时显示 1,否则显示 0?

抱歉,如果这太愚蠢了,我还是编程新手。非常感谢!

最佳答案

使用isinastype :

df['new'] = df['ip'].isin(blacklist).astype(np.int8)

也可以将列转换为分类:

df['new'] = pd.Categorical(df['ip'].isin(blacklist).astype(np.int8))

print (df)
ip os new
0 103022 23 1
1 114221 19 0
2 47902 17 0
3 23550 13 1
4 84644 19 0

对于大型 DataFrame 转换为 Categorical 不节省内存的兴趣:

df = pd.concat([df] * 10000, ignore_index=True)

df['new1'] = pd.Categorical(df['ip'].isin(blacklist).astype(np.int8))
df['new2'] = df['ip'].isin(blacklist).astype(np.int8)
df['new3'] = df['ip'].isin(blacklist)
print (df.memory_usage())
Index 80
ip 400000
os 400000
new1 50096
new2 50000
new3 50000
dtype: int64

时间:

np.random.seed(4545)

N = 10000
df = pd.DataFrame(np.random.randint(1000,size=N), columns=['ip'])
print (len(df))
10000

blacklist = pd.Series(np.random.randint(500,size=int(N/100)))
print (len(blacklist))
100

In [320]: %timeit df['ip'].isin(blacklist).astype(np.int8)
465 µs ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [321]: %timeit pd.Categorical(df['ip'].isin(blacklist).astype(np.int8))
915 µs ± 49.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [322]: %timeit pd.Categorical(df['ip'], categories = blacklist.unique()).notnull().astype(int)
1.59 ms ± 20.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [323]: %timeit df['new_column'] = [1 if x in blacklist.values else 0 for x in df.ip]
81.8 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

关于python - 如果列与特定值匹配,如何在 Pandas Dataframe 中创建虚拟变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49843864/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com