gpt4 book ai didi

python - Pandas 的不同名字顺序

转载 作者:太空宇宙 更新时间:2023-11-04 00:53:09 27 4
gpt4 key购买 nike

我有数据框

    used_at  common users                     pair of websites
0 2014 1364 avito.ru and e1.ru
1 2014 1716 avito.ru and drom.ru
2 2014 1602 avito.ru and auto.ru
3 2014 299 avito.ru and avtomarket.ru
4 2014 579 avito.ru and am.ru
5 2014 602 avito.ru and irr.ru/cars
6 2014 424 avito.ru and cars.mail.ru/sale
7 2014 634 e1.ru and drom.ru
8 2014 475 e1.ru and auto.ru
9 2014 139 e1.ru and avtomarket.ru
10 2014 224 e1.ru and am.ru
11 2014 235 e1.ru and irr.ru/cars
12 2014 154 e1.ru and cars.mail.ru/sale
13 2014 874 drom.ru and auto.ru
14 2014 247 drom.ru and avtomarket.ru
15 2014 394 drom.ru and am.ru
....

当我写 graph_by_common_users = common_users.pivot(index='pair of websites', columns='used_at', values='common users')
我明白了

used_at                                2014    2015
pair of websites
am.ru and cars.mail.ru/sale 166.0 NaN
am.ru and irr.ru/cars 223.0 NaN
auto.ru and am.ru 408.0 224.0
auto.ru and avtomarket.ru 243.0 162.0
auto.ru and cars.mail.ru/sale 330.0 195.0
auto.ru and drom.ru NaN 799.0
auto.ru and irr.ru/cars 409.0 288.0
avito.ru and am.ru 579.0 262.0
....

我有 NaN 因为有些序列不同。例如我有 2014 我有 am.ru 和 cars.mail.ru/sale 但到 2015 我有 cars.mail。 ru/sale 和 am.ru。我该如何改变它?

添加我的代码

import pandas as pd
import itertools
import matplotlib.pyplot as plt

df = pd.read_csv("avito_trend.csv", parse_dates=[2])


def f(df):
dfs = []
for x in [list(x) for x in itertools.combinations(df['address'].unique(), 2)]:

c1 = df.loc[df['address'].isin([x[0]]), 'ID']
c2 = df.loc[df['address'].isin([x[1]]), 'ID']
c = pd.Series(list(set(c1).intersection(set(c2))))
dfs.append(pd.DataFrame({'common users':len(c), 'pair of websites':' and '.join(x)}, index=[0]))
return pd.concat(dfs)

common_users = df.groupby([df['used_at'].dt.year]).apply(f).reset_index(drop=True, level=1).reset_index()
print common_users

graph_by_common_users = common_users.pivot(index='pair of websites', columns='used_at', values='common users')
print graph_by_common_users

Picture with graph

最佳答案

也许在旋转之前,尝试在 "和 " 上进行拆分,然后进行排序,使每一列的顺序相同:

df['pair of websites'] = df['pair of websites'].str.split(' and ')
df['pair of websites'] = df['pair of websites'].apply(lambda x: frozenset(sorted(x)))

只要每个条目的 "和 "部分中的空格数量相同,这似乎就应该有效。如果没有,您可能还必须使用 str.strip()

关于python - Pandas 的不同名字顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36252018/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com