PaddlePaddle在使用情感分析模型预测句子是出现数量类型错误

关键字：数据字典，自定义
问题描述：通过自己写一个句子，使用训练好的模型进行预测。在使用fluid.create_lod_tensor接口准备把数据转换成张量数据进行预测时，出现数据类型错误。
报错信息：

[['paddlepaddle', 'from', 'baidu'], ['this', 'is', 'a', 'great', 'movie'], ['this', 'is', 'very', 'bad', 'fack']]
[[None, 34, None], [9, 5, 2, 78, 16], [9, 5, 51, 81, None]]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-9af2e9ab36a2> in <module>
      9 if __name__ == '__main__':
     10     use_cuda = True  # set to True if training with GPU
---> 11     main(use_cuda)

<ipython-input-22-9af2e9ab36a2> in main(use_cuda)
      4     params_dirname = "understand_sentiment_stacked_lstm.inference.model"
      5 #     train(use_cuda, train_program, params_dirname)
----> 6     infer(use_cuda, inference_program, params_dirname)
      7 
      8 

<ipython-input-21-4bf435dffc3f> in infer(use_cuda, inference_program, params_dirname)
     15     print(lod)
     16     base_shape = [[len(c) for c in lod]]
---> 17     tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
     18     results = inferencer.infer({'words': tensor_words})
     19 

/opt/conda/envs/py35-paddle1.0.0/lib/python3.5/site-packages/paddle/fluid/lod_tensor.py in create_lod_tensor(data, recursive_seq_lens, place)
     75             new_recursive_seq_lens
     76         ] == recursive_seq_lens, "data and recursive_seq_lens do not match"
---> 77         flattened_data = np.concatenate(data, axis=0).astype("int64")
     78         flattened_data = flattened_data.reshape([len(flattened_data), 1])
     79         return create_lod_tensor(flattened_data, recursive_seq_lens, place)

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

问题复现：通过自己定义一句话，然后使用word_dict.get(words.encode('utf-8'))转换成整数编码，使用这些编码创建一个张量数据的时候，就出现以上的错误。错误代码如下：

inferencer = Inferencer(
    infer_func=partial(inference_program, word_dict),
    param_path=params_dirname,
    place=place)
reviews_str = ['paddlepaddle from baidu', 'this is a great movie', 'this is very bad fack']
reviews = [c.split() for c in reviews_str]
print(reviews)
lod = []
for c in reviews:
    lod.append([word_dict.get(words.encode('utf-8')) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})

解决问题：上面出现的错误是因为使用到了数据集字典中没有出现过的单词，在使用word_dict.get(words.encode('utf-8'))转换成整数编码时，就会出现结果为None的情况。如果需要使用UNK = word_dict['<unk>']和word_dict.get(words.encode('utf-8'), UNK)把未知的单词转换成同一个整数编码就不会出现上述问题。正确代码如下：

inferencer = Inferencer(
    infer_func=partial(inference_program, word_dict),
    param_path=params_dirname,
    place=place)
reviews_str = ['paddlepaddle from baidu', 'this is a great movie', 'this is very bad fack']
reviews = [c.split() for c in reviews_str]
UNK = word_dict['<unk>']
print(reviews)
lod = []
for c in reviews:
    lod.append([word_dict.get(words.encode('utf-8'), UNK) for words in c])
print(lod)
base_shape = [[len(c) for c in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
results = inferencer.infer({'words': tensor_words})

PaddlePaddle在使用情感分析模型预测句子是出现数量类型错误

猜你喜欢