- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
当我尝试使用validation_data 输入我自己的验证集时,出现错误。
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-e2816bdbad19> in <module>
2 np.array(X_char_tr).reshape((len(X_char_tr), max_len, max_len_char))],
3 np.array(y_tr).reshape(len(y_tr), max_len, 1),
----> 4 batch_size=32, epochs=10, validation_data=[X_word_te, y_te], verbose=1)
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[ 7993, 30540, 29051, ..., 0, 0, 0],
[ 9571, 24132, 14066, ..., 0, 0, 0],
[19338, 15304, 7322, ..., 0, 0, 0],
...,
[ 5062, 2713...
这是我的 jupyter 笔记本的导出,它是发布到此 blog 的示例的副本
ner数据集取自这里https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import pandas as pd
import numpy as np
# In[11]:
data = pd.read_csv("ner_dataset.csv", encoding="latin1")
# In[13]:
data = data.fillna(method="ffill")
# In[15]:
words = list(set(data["Word"].values))
n_words = len(words); n_words
# In[16]:
tags = list(set(data["Tag"].values))
n_tags = len(tags); n_tags
# In[17]:
class SentenceGetter(object):
def __init__(self, data):
self.n_sent = 1
self.data = data
self.empty = False
agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
s["POS"].values.tolist(),
s["Tag"].values.tolist())]
self.grouped = self.data.groupby("Sentence #").apply(agg_func)
self.sentences = [s for s in self.grouped]
def get_next(self):
try:
s = self.grouped["Sentence: {}".format(self.n_sent)]
self.n_sent += 1
return s
except:
return None
# In[18]:
getter = SentenceGetter(data)
# In[19]:
sent = getter.get_next()
# In[21]:
sentences = getter.sentences
# In[22]:
max_len = 75
max_len_char = 10
# In[23]:
word2idx = {w: i + 2 for i, w in enumerate(words)}
word2idx["UNK"] = 1
word2idx["PAD"] = 0
idx2word = {i: w for w, i in word2idx.items()}
tag2idx = {t: i + 1 for i, t in enumerate(tags)}
tag2idx["PAD"] = 0
idx2tag = {i: w for w, i in tag2idx.items()}
# In[25]:
from keras.preprocessing.sequence import pad_sequences
X_word = [[word2idx[w[0]] for w in s] for s in sentences]
# In[26]:
X_word = pad_sequences(maxlen=max_len, sequences=X_word, value=word2idx["PAD"], padding='post', truncating='post')
# In[27]:
max_len_char
# In[28]:
chars = set([w_i for w in words for w_i in w])
n_chars = len(chars)
print(n_chars)
# In[29]:
char2idx = {c: i + 2 for i, c in enumerate(chars)}
char2idx["UNK"] = 1
char2idx["PAD"] = 0
# In[30]:
X_char = []
for sentence in sentences:
sent_seq = []
for i in range(max_len):
word_seq = []
for j in range(max_len_char):
try:
word_seq.append(char2idx.get(sentence[i][0][j]))
except:
word_seq.append(char2idx.get("PAD"))
sent_seq.append(word_seq)
X_char.append(np.array(sent_seq))
# In[31]:
y = [[tag2idx[w[2]] for w in s] for s in sentences]
# In[32]:
y = pad_sequences(maxlen=max_len, sequences=y, value=tag2idx["PAD"], padding='post', truncating='post')
# In[33]:
from sklearn.model_selection import train_test_split
# In[34]:
X_word_tr, X_word_te, y_tr, y_te = train_test_split(X_word, y, test_size=0.1, random_state=2018)
X_char_tr, X_char_te, _, _ = train_test_split(X_char, y, test_size=0.1, random_state=2018)
# In[35]:
from keras.models import Model, Input
from keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
# In[37]:
# input and embedding for words
word_in = Input(shape=(max_len,))
emb_word = Embedding(input_dim=n_words + 2, output_dim=20,
input_length=max_len, mask_zero=True)(word_in)
# input and embeddings for characters
char_in = Input(shape=(max_len, max_len_char,))
emb_char = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=10,
input_length=max_len_char, mask_zero=True))(char_in)
# character LSTM to get word encodings by characters
char_enc = TimeDistributed(LSTM(units=20, return_sequences=False,
recurrent_dropout=0.5))(emb_char)
# main LSTM
x = concatenate([emb_word, char_enc])
x = SpatialDropout1D(0.3)(x)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
recurrent_dropout=0.6))(x)
out = TimeDistributed(Dense(n_tags + 1, activation="softmax"))(main_lstm)
model = Model([word_in, char_in], out)
# In[38]:
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["acc"])
# In[39]:
model.summary()
# In[52]:
history = model.fit([X_word_tr,
np.array(X_char_tr).reshape((len(X_char_tr), max_len, max_len_char))],
np.array(y_tr).reshape(len(y_tr), max_len, 1),
batch_size=32, epochs=10, validation_data=(X_word_te , y_te), verbose=1)
编辑:添加模型摘要
model.summary()
<小时/>
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) (None, 75, 10) 0
__________________________________________________________________________________________________
input_1 (InputLayer) (None, 75) 0
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, 75, 10, 10) 1000 input_2[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 75, 20) 703600 input_1[0][0]
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, 75, 20) 2480 time_distributed_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 75, 40) 0 embedding_1[0][0]
time_distributed_2[0][0]
__________________________________________________________________________________________________
spatial_dropout1d_1 (SpatialDro (None, 75, 40) 0 concatenate_1[0][0]
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 75, 100) 36400 spatial_dropout1d_1[0][0]
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, 75, 18) 1818 bidirectional_1[0][0]
==================================================================================================
Total params: 745,298
Trainable params: 745,298
Non-trainable params: 0
最佳答案
问题可能出在 model.fit() 的 valid_data 部分。应该是
validation_data=([X_word_te, X_char_te], y_te)
根据您的模型输入的要求。
关于python - 检查模型输入时出错 : the list of Numpy arrays that you are passing to your model is not the size the model expected,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54368377/
我想使用 R 预定义这样的列表 DATA<-list( list(list(),list(),list()), list(list(),list(),list()), list(list(),l
如何将一个列表添加到另一个列表,返回一个列表的列表? foo :: [a] -> [a] -> [[a]] 例如,我想要的结果是: foo [1,2] [3,4] 将是 [[1,2], [3,4]]。
我还没有在这里找到类似问题的解决方案,所以我会寻求你的帮助。 有 2 个列表,其中之一是列表列表: categories = ['APPLE', 'ORANGE', 'BANANA'] test_re
这个问题不同于Converting list of lists / nested lists to list of lists without nesting (这会产生一组非常具体的响应,但无法解决
原始列表转换为 List正好。为什么原始列表的列表不能转换为 List 的列表? { // works List raw = null; List wild = raw; } {
在下面的代码中,get()被调用并将其结果分配给类型为 List> 的变量. get()返回 List>并在类型参数为 T 的实例上调用设置为 ? ,所以它应该适合。 import java.util
原始列表转换为 List正好。为什么原始列表的列表不能转换为 List 的列表? { // works List raw = null; List wild = raw; } {
在insufficiently-polymorphic 作者说: def foo[A](fst: List[A], snd: List[A]): List[A] There are fewer way
我有下面的代码有效。 class ListManipulate(val list: List, val blockCount: Int) { val result: MutableList>
关闭。这个问题需要多问focused 。目前不接受答案。 想要改进此问题吗?更新问题,使其仅关注一个问题 editing this post . 已关闭 5 年前。 Improve this ques
在 scala (2.9) 中转换列表列表的最佳方法是什么? 我有一个 list : List[List[A]] 我想转换成 List[A] 如何递归地实现这一点?或者还有其他更好的办法吗? 最佳答案
我编写了这个函数来确定给定元素是否存储在元组列表的列表中,但目前它只搜索第一个列表。我将如何搜索其余列表? fun findItem (name : command, ((x,y)::firstlis
我创建了一个类名 objectA,它有 4 个变量:约会时间;字符串文本;变量 1,变量 2 我需要创建一个 ObjectA() 列表。然后首先按时间对它们进行分组,其次按 var1,然后按 var2
我有一套说法 char={'J','A'} 和列表的列表 content = [[1,'J', 2], [2, 'K', 3], [2, 'A', 3], [3,'A', 9], [5, 'J', 9
我有以下列表 List >>> titles = new ArrayList >>> ();我想访问它的元素,但我不知道该怎么做.. 该列表有 1 个元素,它又包含 3 个元素,这 3 个元素中的
转换 List[List[Long]] 的最佳方法是什么?到 List[List[Int]]在斯卡拉? 例如,给定以下类型列表 List[List[Long]] val l: List[List[Lo
我有一个来自 Filereader (String) 的 List-List,如何将其转换为 List-List (Double):我必须返回一个包含 line-Array 的第一个 Values 的
我收集了List> 。我需要将其转换为List> 。这是我尝试过的, List> dataOne = GetDataOne(); var dataTwo = dataOne.Select(x => x
这个问题在这里已经有了答案: Cannot convert from List to List> (3 个答案) 关闭 7 年前。 我没有得到这段代码以任何方式编译: List a = new Ar
这个问题在这里已经有了答案: Cannot convert from List to List> (3 个答案) 关闭 7 年前。 我没有得到这段代码以任何方式编译: List a = new Ar
我是一名优秀的程序员,十分优秀!