python - 斯坦福手套 : Dimension anomaly in glove. twitter.27B.200d-6ren

python - 斯坦福手套 : Dimension anomaly in glove. twitter.27B.200d

转载作者：行者123 更新时间：2023-11-28 19:14:55

30

4

我从 http://nlp.stanford.edu/data/glove.twitter.27B.zip 下载了 Glove-twitter 预训练向量

当我在内存中加载向量(使用 glove.twitter.27B.200d.txt)时，我发现 900 个词，其向量为 199 维，而对于其余所有词，其向量为 200 维。 根据我的理解 - 此文件中的每个矢量都应该恰好是 200 维。没有？

我正在使用以下 python 代码得出我的结论

import pickle
import numpy as np

glove_model_path = './glove.twitter.27B.200d.txt'

f = open(glove_model_path,'r')

model = {}
counter = 0

vary_length = 0
anamolies = []

for line in f:
    counter += 1
    items = line.replace('\r','').replace('\n','').split(' ')
    word = items[0]
    vect = np.array([float(i) for i in items[1:] if len(i) > 1])
    if (len(vect) != 200):
        vary_length += 1
        anamolies.append(word)

f.close()

print vary_length

Output is : 900

最佳答案

正确，每个向量应该有 200 个元素。

详细来说，我怀疑问题出在您的代码中，具体而言:items = line.replace('\r','').replace('\n','').split(' ')

你为什么不打印这 900 行中的任何一行，看看它们是什么样子的。根据标记化的完成方式，您可能会遇到\r 或\n 被视为单词的情况，因此您要删除一些元素。不过我觉得很奇怪，默认情况下不会将空格合并在一起。

此外，您可能想检查 API 是否读取这些向量而不是自己滚动。您的代码做出了一些可能不正确的格式假设。

关于python - 斯坦福手套 : Dimension anomaly in glove. twitter.27B.200d，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34695225/

30

4

0

文章推荐： javascript - 这是 Javascript 中类继承的正确方法吗？

文章推荐： iphone - UIView 不随 UIscrollView 移动

文章推荐： html - 使用 Flexbox 的等高图像

文章推荐： javascript - Highcharts 。饼形图。数据标签格式化程序

r - if((dimension <1)|(dimension> n))stop (“wrong embedding dimension”): argument is of length zero中的错误
我的代码如下所示: unemp n)) stop("wrong embedding dimension") : argument is of length zero Called from: emb
python - 可见弃用警告 : boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1
Macports 更新后，我认为更新了 numpy，我收到警告: VisibleDeprecationWarning: boolean index did not match indexed arra
pytorch - 运行时错误: dimension specified as 0 but tensor has no dimensions
我试图使用 MNIST 数据集实现简单的神经网络，但我不断收到此错误将 matplotlib.pyplot 导入为 plt import torch from torchvision import
java - 安卓 : Camera's dimensions doesn't match screen's dimensions
我编写了自己的相机 Activity ，我在FrameLayout中显示照片的实时预览，但是实时图片看起来不自然，有点高，我认为这是因为尺寸FrameLayout 的尺寸与相机的尺寸不匹配。我应该做什
python - 值错误 : Invalid reduction dimension 1 for input with 1 dimensions
tf.reduce_mean() 函数以 axis 参数中引用的索引的方式对数组元素求和。在下面的代码中: import tensorflow as tf x = tf.Variable([1, 2
tableau-api - 画面 : How do I create categories from a dimension based on the count of another dimension?
我有一个航类延误数据电子表格，我正在处理一个显示每个机场总延误时间的工作表。我想过滤维度“机场”，即根据每个机场的起飞次数创建机场类别，“小型”、“中型”和“大型”，这是通过计算维度“航类号”计算得出
python - "ValueError: query data dimension must match training data dimension"
如何在 3 列 X_train 数据上使用以下代码。错误 "ValueError: query data dimension must match training data dimension" 在
javascript - 字符串 'properties.dimensions.length' 访问散列 {properties : {dimensions: {length: 23}}}
JavaScript:给定一个字符串数组: ['properties.dimensions.length', 'properties.name'] 在散列中使用这些来验证或访问它们(实际上只是想验证
javascript - 字符串 'properties.dimensions.length' 访问散列 {properties : {dimensions: {length: 23}}}
JavaScript:给定一个字符串数组: ['properties.dimensions.length', 'properties.name'] 在散列中使用这些来验证或访问它们(实际上只是想验证
c++ - C++ 中的 vector : Why does the outer dimension give EXC_BAD_ACCESS and the inner dimension doesn't?
我有以下代码用于整数 vector 的 vector (即整数矩阵..) vector > scores (3, vector(2,0)); cout<
python - IndexError : Dimension out of range - PyTorch dimension expected to be in range of [-1, 0]，但得到 1
尽管已经有很多关于这个主题的答案，但在下面的例子中没有看到(摘自 https://gist.github.com/lirnli/c16ef186c75588e705d9864fb816a13c on
JavaScript/HTML : How do I display an IMG with a set dimension and if the image is wider or taller than that dimension, 裁剪/ overflow hidden ？
我有一堆保证有的图片: 最小宽度 = 200 像素最大宽度 = 250 像素最小高度 = 150 像素最大高度 = 175 像素我想要做的是显示一个由 200 像素 x 150 像素组成的图像
input: kMAX dimensions in profile 0 are [2,3,128,128] but input has static dimensions [1,3,128,128]
转tensorrt时报错： input: kMAX dimensions in profile 0 are [2,3,128,128] but input has static dimensions
python-3.x - sklearn 中的 K 最近邻 - ValueError : query data dimension must match training data dimension
我正在尝试对在 UCI 机器学习数据库中找到的一些文本识别数据进行 k 最近邻预测。 (https://archive.ics.uci.edu/ml/datasets/Letter+Recogniti
dimensions - 高维等值面跟踪
如何有效地在更高维空间上追踪等值面最佳答案你有一个 N 维的标量成本函数， f(y0, y1, .., yN) ∊ ℝ, y ∊ ℝ 但仅在规则的矩形网格中采样， yk = Ψk + ψk x
Rjags错误消息: Dimension mismatch
我正在尝试根据《Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2015)》一书来学习贝叶斯分析。这本书里有例子。所
MATLAB : Dimension reduction
LEt x_t = F(x_{t-1}) 是 chaotic regime. 中的一个时间离散动力系统从初始条件x_0开始，我们可以生成一个时间序列=x_t，其中t =1,2,...,T 表示时间索
react-native - 找不到变量 : Dimensions
当我尝试使用: const {width, height} = Dimensions.get('window'); 在 React Native 组件上，我收到一个以前从未见过的奇怪错误: 找不到变量
tensorflow - 为什么 'dimension' 在机器学习领域有几个不同的含义？
关闭。这个问题是opinion-based .它目前不接受答案。想要改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 关闭 2 年前。 Improve
java - 如何将 Dimension 值转换为字符串或整数
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。这个问题似乎与 help center 中定义的范围内的编程无关。 . 已关闭 9 年前。 Improve

首页

博学

6Ren·AI

商城

python - 斯坦福手套 : Dimension anomaly in glove. twitter.27B.200d