gpt4 book ai didi

python - 如何将记录从两类重新分类为四类

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:18:13 28 4
gpt4 key购买 nike

我有一个 pandas 数据框,其中包含数百万客户的产品名称 [a、b、c、d、e、f、j、h、i、j、k、l]。对于每个产品,数据报告客户在当月是使用该产品(用 1 表示)还是没有使用(用 0 表示)。

客户的原始分类:使用1,不使用0
我想将产品用途重新分类为四类:

S:二手
M: maintained use(连续几个月使用)
N:未使用
D: Maintained non-used(连续几个月不使用)

原始数据如下所示:

+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19509 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19509 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19509 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19509 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19510 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19510 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19510 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19510 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19510 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
| 19511 | Jan | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Feb | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 19511 | Mar | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Apr | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | May | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jun | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 19511 | Jul | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| 19511 | Aug | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 19511 | Sep | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+

我想将客户重新分为四类,以考虑那些保持使用或保持几个月不使用的客户。

结果应该如下所示:

+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+
| Customer_ID | Month | a | b | c | d | e | f | j | h | i | j | k | l |
+-------------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 19509 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19509 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19509 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19509 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19509 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19509 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19509 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19509 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19509 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19510 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19510 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19510 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19510 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19510 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19510 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19510 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19510 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19510 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
| 19511 | Jan | S | N | S | N | N | S | N | S | N | S | S | N |
| 19511 | Feb | M | N | N | D | D | M | D | M | D | N | M | D |
| 19511 | Mar | M | S | S | D | D | M | D | M | D | S | M | D |
| 19511 | Apr | N | M | N | S | D | M | D | M | D | N | N | D |
| 19511 | May | D | N | D | M | S | M | D | M | D | D | D | D |
| 19511 | Jun | D | D | D | M | N | M | D | M | D | D | D | D |
| 19511 | Jul | S | S | S | N | D | M | D | M | D | S | S | D |
| 19511 | Aug | N | M | N | D | D | M | D | N | D | N | N | D |
| 19511 | Sep | S | M | S | S | D | M | D | D | S | S | S | D |
+-------------+-------+---+---+---+---+---+---+---+---+---+---+---+---+

执行此操作的算法看起来很复杂,我仍在考虑执行此操作的适当顺序。

我想为所有客户和所有产品(列)做这件事,我认为我们可以这样开始:

for i in customer_ID:
for j in df.columns:

注意:这种情况不是用例和非用例,而是 join(1)、cancel(0)、keep idle(0) 和 if again joined(1) 等等。所以当它为零时,表示客户取消了服务,当 future 三个月为零时,表示他不是客户然后加入并再次取消,我们应该知道他取消了多少次服务.如果我们只计算总数,它不会告诉我们客户加入了多少次以及他取消了特定产品或服务的次数。

我很感激任何解决这个问题的建议或想法。

最佳答案

提示:

Prefix sum :

  • 如果增加 - 使用
  • 如果增长迟缓超过一段时间,但到了 12 月,总和超过阈值 - 维持使用

剩下的你自己算吧。

Kadane's algoritm - 最大子数组 - 如果您将 +1 标记为使用,-1 标记为未使用,这将告诉您使用比不使用普遍的时期的最大长度。

关于python - 如何将记录从两类重新分类为四类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40692159/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com