gpt4 book ai didi

python Pandas : how to convert a list of pair mappings to a row-vector format?

转载 作者:行者123 更新时间:2023-11-28 20:15:41 24 4
gpt4 key购买 nike

我有一个 2 列的 DataFrame,第 1 列对应于客户,第 2 列对应于该客户访问过的城市。 DataFrame 如下所示:

print(df)

customer visited_city
0 John London
1 Mary Melbourne
2 Steve Paris
3 John New_York
4 Peter New_York
5 Mary London
6 John Melbourne
7 John New_York

想将上面的 DataFrame 转换成行向量格式,这样每一行代表一个唯一的用户,行向量表示访问过的城市。

print(wide_format_df)

London Melbourne New_York Paris
John 1.0 1.0 1.0 0.0
Mary 1.0 1.0 0.0 0.0
Steve 0.0 0.0 0.0 1.0
Peter 0.0 0.0 1.0 0.0

下面是我用来生成宽格式的代码。它逐一遍历每个用户。我想知道是否有更有效的方法来做到这一点?

import pandas as pd
import numpy as np

UNIQUE_CITIESS = np.sort(df['visited_city'].unique())
p = len(UNIQUE_CITIESS)
unique_customers = df['customer'].unique().tolist()

X = []
for customer in unique_customers:
x = np.zeros(p)
city_visited = np.sort(df[df['customer'] == customer]['visited_city'].unique())
visited_idx = np.searchsorted(UNIQUE_CITIESS, city_visited)
x[visited_idx] = 1
X.append(x)
wide_format_df = pd.DataFrame(np.array(X), columns=UNIQUE_CITIESS, index=unique_customers)
wide_format_df

最佳答案

请注意,您的问题已被编辑,所提供的答案不再回答您的问题。他们必须调整为只为 纽约John 返回 1,尽管事实上他去过那里两次。

选项 1 pir1
我喜欢这个答案,因为我认为它很优雅。

pd.get_dummies(df.customer).T.dot(pd.get_dummies(df.visited_city)).clip(0, 1)

London Melbourne New_York Paris
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1

选项 2 pir2
这个答案应该很快。

i, r = pd.factorize(df.customer.values)
j, c = pd.factorize(df.visited_city.values)
n, m = r.size, c.size
b = np.zeros((n, m), dtype=int)
b[i, j] = 1

pd.DataFrame(b, r, c).sort_index().sort_index(1)

London Melbourne New_York Paris
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1

选项 3 pir3
实用且非常快速

df.groupby(['customer', 'visited_city']).size().unstack(fill_value=0).clip(0, 1)

visited_city London Melbourne New_York Paris
customer
John 1 1 1 0
Mary 1 1 0 0
Peter 0 0 1 0
Steve 0 0 0 1

时机
下面的代码

# Multiples of Minimum time
#
pir1 pir2 pir3 wen vai
10 1.392237 1.0 1.521555 4.337469 5.569029
30 1.445762 1.0 1.821047 5.977978 7.204843
100 1.679956 1.0 1.901502 6.685429 7.296454
300 1.568407 1.0 1.825047 5.556880 7.210672
1000 1.622137 1.0 1.613983 5.815970 5.396008
3000 1.808637 1.0 1.852953 4.159305 4.224724
10000 1.654354 1.0 1.502092 3.145032 2.950560
30000 1.555574 1.0 1.413612 2.404061 2.299856

enter image description here

wen = lambda d: d.pivot_table(index='customer', columns='visited_city',aggfunc=len, fill_value=0)
vai = lambda d: pd.crosstab(d.customer, d.visited_city)
pir1 = lambda d: pd.get_dummies(d.customer).T.dot(pd.get_dummies(d.visited_city)).clip(0, 1)
pir3 = lambda d: d.groupby(['customer', 'visited_city']).size().unstack(fill_value=0).clip(0, 1)

def pir2(d):
i, r = pd.factorize(d.customer.values)
j, c = pd.factorize(d.visited_city.values)
n, m = r.size, c.size
b = np.zeros((n, m), dtype=int)
b[i, j] = 1

return pd.DataFrame(b, r, c).sort_index().sort_index(1)

results = pd.DataFrame(
index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
columns='pir1 pir2 pir3 wen vai'.split(),
dtype=float
)

for i in results.index:
d = pd.concat([df] * i, ignore_index=True)
for j in results.columns:
stmt = '{}(d)'.format(j)
setp = 'from __main__ import d, {}'.format(j)
results.at[i, j] = timeit(stmt, setp, number=10)

print((lambda r: r.div(r.min(1), 0))(results))

results.plot(loglog=True)

关于 python Pandas : how to convert a list of pair mappings to a row-vector format?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45911087/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com