-
关键字:
数据字典
,自定义
-
问题描述:通过自己写一个句子,使用训练好的模型进行预测。在使用
fluid.create_lod_tensor
接口准备把数据转换成张量数据进行预测时,出现数据类型错误。 -
报错信息:
[['paddlepaddle', 'from', 'baidu'], ['this', 'is', 'a', 'great', 'movie'], ['this', 'is', 'very', 'bad', 'fack']]
[[None, 34, None], [9, 5, 2, 78, 16], [9, 5, 51, 81, None]]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-9af2e9ab36a2> in <module>
9 if __name__ == '__main__':
10 use_cuda = True # set to True if training with GPU
---> 11 main(use_cuda)
<ipython-input-22-9af2e9ab36a2> in main(use_cuda)
4 params_dirname = "understand_sentiment_stacked_lstm.inference.model"
5 # train(use_cuda, train_program, params_dirname)
----> 6 infer(use_cuda, inference_program, params_dirname)
7
8
<ipython-input-21-4bf435dffc3f> in infer(use_cuda, inference_program, params_dirname)
15 print(lod)
16 base_shape = [[len(c) for c in lod]]
---> 17 tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
18 results = inferencer.infer({'words': tensor_words})
19
/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/lod_tensor.py in create_lod_tensor(data, recursive_seq_lens, place)
75 new_recursive_seq_lens
76 ] == recursive_seq_lens, "data and recursive_seq_lens do not match"
---> 77 flattened_data = np.concatenate(data, axis=0).astype("int64")
78 flattened_data = flattened_data.reshape([len(flattened_data), 1])
79 return create_lod_tensor(flattened_data, recursive_seq_lens, place)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
- 问题复现:通过自己定义一句话,然后使用
word_dict.get(words.encode('utf-8'))
转换成整数编码,使用这些编码创建一个张量数据的时候,就出现以上的错误。错误代码如下:
inferencer = Inferencer(
infer_func=partial(inference_program, word_dict),
param_path=params_dirname,
place=place)
reviews_str = ['paddlepaddle from baidu', 'this is a great movie', 'this is very bad fack']
reviews = [c.split() for c in reviews_str]
print(reviews)
lod = []
for c in reviews:
lod.append([word_dict.get(words.encode('utf-8')) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})
- 解决问题:上面出现的错误是因为使用到了数据集字典中没有出现过的单词,在使用
word_dict.get(words.encode('utf-8'))
转换成整数编码时,就会出现结果为None
的情况。如果需要使用UNK = word_dict['<unk>']
和word_dict.get(words.encode('utf-8'), UNK)
把未知的单词转换成同一个整数编码就不会出现上述问题。正确代码如下:
inferencer = Inferencer(
infer_func=partial(inference_program, word_dict),
param_path=params_dirname,
place=place)
reviews_str = ['paddlepaddle from baidu', 'this is a great movie', 'this is very bad fack']
reviews = [c.split() for c in reviews_str]
UNK = word_dict['<unk>']
print(reviews)
lod = []
for c in reviews:
lod.append([word_dict.get(words.encode('utf-8'), UNK) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})