python - 错误 : IndexError: index 6319 is out of bounds for axis 0 with size 0-6ren

python - 错误 : IndexError: index 6319 is out of bounds for axis 0 with size 0

转载作者：行者123 更新时间：2023-12-04 12:08:38

下面的代码取自 https://github.com/arunarn2/HierarchicalAttentionNetworks/blob/master/HierarchicalAttn.py有一些小的调整。虽然我理解错误的含义，但我无法弄清楚它是如何在以下代码中蔓延的以及如何纠正它。我已经坚持了很长一段时间，非常感谢一些帮助。谢谢!
(这是完整的代码)

maxlen = 100
max_sentences = 15
max_words = 20000
embedding_dim = 100
validation_split = 0.2
reviews = []
labels = []
texts = []
glove_dir = "./glove.6B"
embeddings_index = {}


# class defining the custom attention layer
class HierarchicalAttentionNetwork(Layer):
    def __init__(self, attention_dim):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(HierarchicalAttentionNetwork, self).__init__()

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
        self.b = K.variable(self.init((self.attention_dim,)))
        self.u = K.variable(self.init((self.attention_dim, 1)))
        self.trainable_weights = [self.W, self.b, self.u]
        super(HierarchicalAttentionNetwork, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return mask

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))

        ait = K.exp(K.squeeze(K.dot(uit, self.u), -1))

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        weighted_input = x * K.expand_dims(ait)
        output = K.sum(weighted_input, axis=1)

        return output

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[-1]


def remove_html(str_a):
    p = re.compile(r'<.*?>')
    return p.sub('', str_a)


# replace all non-ASCII (\x00-\x7F) characters with a space
def replace_non_ascii(str_a):
    return re.sub(r'[^\x00-\x7f]', r'', str_a)


# Tokenization/string cleaning for dataset
def clean_str(string):
    string= string.decode("utf-8")
    string = re.sub(r"\\", "", string)
    string = re.sub(r"\'", "", string)
    string = re.sub(r"\"", "", string)
    return string.strip().lower()



input_data = pd.read_csv(io.BytesIO(uploaded['labeledTrainData.tsv']), sep='\t')

for idx in range(input_data.review.shape[0]):
    text = BeautifulSoup(input_data.review[idx], features="html5lib")
    text = clean_str(text.get_text().encode('ascii', 'ignore'))
    texts.append(text)
    sentences = tokenize.sent_tokenize(text)
    reviews.append(sentences)
    np.append(labels, input_data.sentiment[idx])

tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(texts)

data = np.zeros((len(texts), max_sentences, maxlen), dtype='int32')

for i, sentences in enumerate(reviews):
    for j, sent in enumerate(sentences):
        if j < max_sentences:
            wordTokens = text_to_word_sequence(sent)
            k = 0
            for _, word in enumerate(wordTokens):
                if k < maxlen and tokenizer.word_index[word] < max_words:
                    data[i, j, k] = tokenizer.word_index[word]
                    k = k + 1

word_index = tokenizer.word_index
print('Total %s unique tokens.' % len(word_index))

if np.any(np.array(labels)):
    labels = np_utils.to_categorical(np.array(labels))
#labels = to_categorical(np.asarray(labels))

print('Shape of reviews (data) tensor:', data.shape)
print('Shape of sentiment (label) tensor:', np.shape(labels))

indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = np.asarray(labels)[indices.astype(int)]
#labels = labels[indices]
nb_validation_samples = int(validation_split * data.shape[0])

x_train = data[:-nb_validation_samples]
y_train = labels[:-nb_validation_samples]
x_val = data[-nb_validation_samples:]
y_val = labels[-nb_validation_samples:]

print('Number of positive and negative reviews in training and validation set')
print (y_train.sum(axis=0))
print (y_val.sum(axis=0))


f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Total %s word vectors.' % len(embeddings_index))

# building Hierachical Attention network
embedding_matrix = np.random.random((len(word_index) + 1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector

embedding_layer = Embedding(len(word_index) + 1, embedding_dim, weights=[embedding_matrix],
                            input_length=maxlen, trainable=True, mask_zero=True)

sentence_input = Input(shape=(maxlen,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
lstm_word = Bidirectional(GRU(100, return_sequences=True))(embedded_sequences)
attn_word = HierarchicalAttentionNetwork(100)(lstm_word)
sentenceEncoder = Model(sentence_input, attn_word)

review_input = Input(shape=(max_sentences, maxlen), dtype='int32')
review_encoder = TimeDistributed(sentenceEncoder)(review_input)
lstm_sentence = Bidirectional(GRU(100, return_sequences=True))(review_encoder)
attn_sentence = HierarchicalAttentionNetwork(100)(lstm_sentence)
preds = Dense(2, activation='softmax')(attn_sentence)
model = Model(review_input, preds)

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

print("model fitting - Hierachical attention network")
model.fit(x_train, y_train, validation_data=(x_val, y_val), nb_epoch=10, batch_size=100)

完整的错误:

最佳答案

Python 是 complaining因为您试图通过索引访问 labels数组但它是空的，如控制台输出中所示:

Shape of sentiment (label) tensor: (0,)

问题出在这一行:

np.append(labels, input_data.sentiment[idx])

在您引用的原始代码中，一个新值附加到 labels list .这个变化就发生在原地 list被修改。相反，如 numpy documentation 所示, 在描述 np.append 返回的值时, 正在 arr原始数组:

A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.

即，您原来的 labels list ，一个空数组，永远不会被修改，这会导致在代码后面尝试通过索引访问数组时出错。
如果你想实现类似的行为，你需要像这样修改你的代码:

labels = np.append(labels, input_data.sentiment[idx])

请注意，由于解释的原因，此操作将非常低效，最好将情感结果直接附加到原始 labels 上。 list如原始代码:

labels.append(input_data.sentiment[idx])

请参阅 this related SO question以及。

关于python - 错误 : IndexError: index 6319 is out of bounds for axis 0 with size 0，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69074018/

文章推荐： python - 在python中获取所有可能的订单组合

文章推荐： kendo-ui - Kendo UI 有 CSS 引用吗？

文章推荐： r - 如何修改 R 包以创建用于重新加权核密度估计的新包？

memory - cuda 'memory bound' vs 'latency bound' vs 'bandwidth bound' vs 'compute bound'
在许多在线资源中，可以找到“内存”、“带宽”、“延迟”绑定(bind)内核的不同用法。在我看来，作者有时会使用他们自己对这些术语的定义，我认为这对某人做出明确区分非常有益。据我了解:带宽绑定(bin
algorithm - FIFO Branch and Bound、LIFO Branch and Bound 和 LC Branch and Bound 之间有什么区别？
FIFO、LIFO 和LC Branch and Bound 有什么区别？最佳答案 Branch & Bound 通过使用估计边界来限制可能解决方案的数量来发现完整搜索空间内的分支。不同的类型(FI
c# - Columns.Bound 无法解析符号 'Bound'
我有一个网页，其中有一些 Kendo 控件(例如下拉菜单和按钮)可以正常工作，但是添加Grid 会导致问题。 @(Html.Kendo().Grid(Model).Name("grid").Colu
optimization - 术语 "CPU bound"和 "I/O bound"是什么意思？
术语“CPU 限制”和“I/O 限制”是什么意思？最佳答案这非常直观: 如果 CPU 更快，程序就会运行得更快，即程序的大部分时间只是使用 CPU(进行计算)，则该程序是 CPU 密集型。计算
c++ - 警告 : array subscript is above array bounds [-Warray-bounds]
我在以下代码段中遇到问题并发出警告，希望您能帮助我: fprintf (fp, "%dd%d+%d ", pMobIndex->mana[DICE_NUMBER], DICE_NUMBER 在我
swift - '...' 的重载存在这些结果类型 : ClosedRange, CountableClosedRange
swift 2 let gap = CGFloat(randomInRange(StackGapMinWidth...maxGap)) Missing argument label 'range:'
swift - 这些结果类型存在 '...' 的重载 : ClosedRange, CountableClosedRange
swift 2 let gap = CGFloat(randomInRange(StackGapMinWidth...maxGap)) Missing argument label 'range:'
c - 警告 : array subscript is above array bounds [-Warray-bounds] in module
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。关闭 6 年前。这个问题是由于打字错误或无法再重现的问题引起的。虽然类似的问题可能是on-topic在
c++ - GCC8.2如何启用警告: array subscript is above array bounds [-Warray-bounds]
我想在gcc8.2下启用数组边界检查，这样可以帮助在编译期间检查数组下标是否越界，它可能会给出如下警告:数组下标高于数组边界 [-Warray-bounds] 我使用 coliru 做了一个演示: #
ios - apple API 中的变量 "bounds.minX", "bounds.maxX"定义在哪里？
我只是想知道在 Apple API 中的什么地方定义了变量“bounds.minX”、“bounds.maxX”？我查看了“UIView”和“CGRect”文档，但似乎找不到它？最佳答案它包含在"
ios - [UIScreen mainScreen].bounds vs [UIApplcation sharedApplication].keyWindow.bounds？
我想覆盖整个屏幕。我想将其框架设置为覆盖整个屏幕。浏览堆栈溢出时，我发现了这两种不同的设置 View 框架以覆盖屏幕的方法: [UIScreen mainScreen].bounds [UIApplc
kotlin - 在 Kotlin 的协程中暂停 IO-bound 和 CPU-bound 函数
在协程中执行 IO 绑定(bind)函数(例如，从后端请求数据)给了我一个优势，即在请求结果可用之前暂停它的执行，对吗？但是，受 CPU 限制的函数(例如，解析一个巨大的文本文件)不会“等待”任何东西
java - 如何创建通用子类的实例？出现错误 : "Bound mismatch: The type ... is not a valid substitute for the bounded parameter ..."
public class ChampionsLeague> extends League{ ... 如何创建此类的实例？ ChampionsLeague league = new ChampionsL
Java 泛型 : Bound mismatch: The type is not a valid substitute for the bounded parameter of the type
我遇到了以下问题: 我有这些类和接口(interface)定义 public abstract class ViewModelRefreshPostListFragment> extends
ios - view.bounds.size.height -vs- view.bounds.height——有什么区别吗？
我注意到在使用 (Swift 4.0) 的 IOS X 代码中，我至少可以通过以下两种方式请求 View 的高度 V: V.bounds.size.height 和... V.bounds.heigh
ios - swift 中 bounds.size.width 和 bounds.width 之间的区别？
swift 中 bounds.size.width 和 bounds.width 有什么区别？他们会返回同样的东西吗？谢谢! 最佳答案 bounds 是 UIView 的 CGRect 结构属性，其中
java - `Random.ints(origin, bound)` 、 `.longs(origin, bound)` 等的 API 是否缺少功能？
在我看来不可能包含 Integer.MAX_VALUE和Long.MAX_VALUE创建 IntStream 时尽可能使用随机值或LongStream使用 java.util.Random 的边界类。
Java : Bound mismatch: is not a valid substitute for the bounded parameter >
我有二叉树类: public class BinaryTree> extends AbstractTree { protected TreeNode root;
c# - UIScreen.MainScreen.Bounds.Height 和 UIScreen.MainScreen.Bounds.Width 都是 0
我最近做了并更新了我的 Xamarin iOS 项目，我曾经能够调用以下代码来检索屏幕宽度和高度: if (orientation == UIInterfaceOrientation.Landscap
c - 警告 : array subscript is above array bounds [-Warray-bounds] when compiling using -O3 flag
我仍然不明白为什么我收到这个警告 array subscript is above array bounds [-Warray-bounds] 对于一个小的 C 代码如下: #include #in

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 错误 : IndexError: index 6319 is out of bounds for axis 0 with size 0