python - 热编码: missing columns-6ren

python - 热编码: missing columns

转载作者：行者123 更新时间：2023-12-01 02:43:54

25

4

我有 1000000 条记录的训练集和 100 条记录的测试集。为了创建推荐系统，我创建了两个组织如下的数据框:

[in]print(training_df.head(n=5))

[out]                     product_id
transaction_id                      
0000001                   [P06, P09]
0000002         [P01, P05, P06, P09]
0000003                   [P01, P06]
0000004                   [P01, P09]
0000005                   [P06, P09]

然后，我使用 sklearn 创建一个矩阵，其中 Product_id 作为列，transaction_id 作为行(索引)。

这是代码:

# Create a matrix for the transactions
from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
training_df1 = training_df.join(pd.DataFrame(mlb.fit_transform(training_df.pop('product_id')),
                          columns=mlb.classes_,
                          index=training_df.index))

product_id 是 P01-P10。问题是训练数据中缺少 P04 和 P08，因此我的 Training_df1 只有 8 列而不是 10 列。如何添加这两列并为所有交易填充 0？

最佳答案

在初始化 MultiLabelBinarizer 时，您可以将预定义的产品 ID P01-P10 作为类传递，因此输出将始终包含这些类别作为列:

from sklearn.preprocessing import MultiLabelBinarizer

product_ids = ['P{:02d}'.format(i+1) for i in range(10)]
print(product_ids)
# ['P01', 'P02', 'P03', 'P04', 'P05', 'P06', 'P07', 'P08', 'P09', 'P10']

mlb = MultiLabelBinarizer(classes=product_ids)
training_df.join(pd.DataFrame(mlb.fit_transform(training_df['product_id']),
                              columns=mlb.classes_,
                              index=training_df.index))

<小时/>

仅获取矩阵:

training_df.drop('product_id', 1).join(
    pd.DataFrame(mlb.fit_transform(training_df['product_id']), columns=mlb.classes_, index=training_df.index)
)

关于python - 热编码: missing columns，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45400744/

25

4

0

文章推荐： jquery - 使用 jQuery 中的自定义方法返回 promise

文章推荐： javascript - 如何美化/重构这段JavaScript代码？

文章推荐： jquery - 如何使用多个 is() 选择器？

文章推荐： .net - Nemerle Actor /协程和 Mono Continuations

python - 热 python 输入循环
我想要类似于以下伪代码的东西: while input is not None and timer = 5: print "took too long" else: print inp
c# - 热/冷 Observable，增加订阅者
如何将 MainEngine Observable 转换为 Cold？来自这个例子: public IObservable MainEngine { get
这款Moto 360圆形智能手表，在五年前引发手表「刷机」热
自从手表被发明以来，表盘的方圆之争就始终没有停下来过，在漫长的岁月中，无论是方形还是圆形表盘，人们都为其寻找到足够多的设计元素，让其肆意成长，这种生机与活力后来也延续到了智能手表上，在2014年，这
cuda - 用 CUDA 求解二维扩散(热)方程
我正在学习 CUDA，试图解决一些标准问题。例如，我正在使用以下代码求解二维扩散方程。但我的结果与标准结果不同，我无法弄清楚。 //kernel definition __global__ void
java - 在 JBoss(热)重新部署后找到所需的 dll？
我的 Web 应用程序使用 native dll 来实现其部分功能(其位置在 PATH 中提供)。一切正常，直到我对 WAR 进行更改并且 JBoss 热部署此 WAR。此时dll已经找不到了，需要手
java - 热 Observables 的 RxJava 延迟
我看到这个问题here 。这是关于实现每个发出的项目的延迟。这是根据accepted answer如何实现的: Observable.zip(Observable.range(1, 5) .g
mysql - 热 vs 冷 mysql 模式迁移和提高速度
我最近一直在进行冷迁移...这意味着我无法在进行迁移时从应用程序级别读取/写入数据库(维护页面)。这样就不会因为更改结构而发生错误，而且如果负载很大，我也不希望 mysql 在迁移过程中崩溃。我的

首页

博学

6Ren·AI

商城

python - 热编码: missing columns