python - Hstacking 功能以某种方式导致预测速度额外放缓-6ren

python - Hstacking 功能以某种方式导致预测速度额外放缓

转载作者：太空宇宙更新时间：2023-11-03 15:17:24

当我使用 CountVectorizer 等生成的一些稀疏矩阵的 scipy.sparse.hstack 时，我想合并它们以用于回归，但不知何故它们速度较慢:

X1 有 10000 个来自 analyse="char"的特征
X2 有 10000 个来自 analyse="word"的特征
X3 有 20000 个来自 analyse="char"的特征
X4 有 20000 个来自 analyse="word"的特征

当您对 X1 和 X2 进行 hstack 时，您会期望它的速度与 X3 或 X4 大致相同(相同数量的特征)。但这似乎还差得远:

from scipy.sparse import hstack
>>> a=linear_model.Ridge(alpha=30).fit(hstack((X1, X2)),y).predict(hstack((t1,t2)))
time:  57.85
>>> b=linear_model.Ridge(alpha=30).fit(X1,y).predict(t1)
time:  6.75
>>> c=linear_model.Ridge(alpha=30).fit(X2,y).predict(t2)
time:  7.33
>>> d=linear_model.Ridge(alpha=30).fit(X3,y).predict(t3)
time:  6.80
>>> e=linear_model.Ridge(alpha=30).fit(X4,y).predict(t4)
time:  11.67

我什至在某些时候注意到，当我hstack 只有一个特征时，模型也会变慢。是什么导致了这种情况，我做错了什么，当然还有什么可以改进？

值得注意的编辑:

我想介绍一种我认为可以解决它的方法，即构建词汇表并使用它来适应:

feats = []
method = CountVectorizer(analyzer="word", max_features=10000, ngram_range=(1,3))
method.fit(train["tweet"])
X = method.fit(...)
feats.extend(method.vocabulary_.keys())
method = CountVectorizer(analyzer="char", max_features=10000, ngram_range=(4,4))
method.fit(train["tweet"])
X2 = method.fit(...)
feats.extend(method.vocabulary_.keys())
newm = CountVectorizer(vocabulary=feats)
newm.fit(train["tweet"])
X3 = newm.fit(...)

当我适合这些时，存储的项目数量会发生一些奇怪的事情(我并不惊讶没有 20,000 个特征，因为可能有重叠)。怎么会有这么少的“一个”？

X
<49884x10000 sparse matrix of type '<class 'numpy.int64'>'
    with 927131 stored elements in Compressed Sparse Row format>
X2
<49884x10000 sparse matrix of type '<class 'numpy.int64'>'
    with 3256162 stored elements in Compressed Sparse Row format>
X3
<49884x19558 sparse matrix of type '<class 'numpy.int64'>'
    with 593712 stored elements in Compressed Sparse Row format>

最佳答案

Hstacking将其转换为COO格式:

>>> hstack((csr_matrix([1]), csr_matrix([2])))
<1x2 sparse matrix of type '<type 'numpy.int64'>'
    with 2 stored elements in COOrdinate format>

也许可以执行 hstack(...).tocsr() 来检查它是否加快了速度。

关于python - Hstacking 功能以某种方式导致预测速度额外放缓，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19864932/

文章推荐： c# - 在未处理的异常处理程序中异步写入文件

文章推荐： python - Moinmoin Markdown 攻击不起作用

文章推荐： c# - 访问路径 ...\Temp\Assembly-CSharp.dll.mdb"被拒绝

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - Hstacking 功能以某种方式导致预测速度额外放缓

值得注意的编辑: