（翻译）Neural Architectures for Named Entity Recognition

Abstract

　　...

　　好的命名实体识别系统严重依赖于人工处理的特征与特定领域的知识，从而更高效地学习小型、带标记的语料库。在这篇论文里我们介绍了两种神经结构——一种结构是基于双向LSTM与条件随机场，另一种结构是通过一种基于过渡、Shift-Reduce解析的方法构造并标记了单词。我们的模型依赖于单词信息的两个来源：一是从带监督语料库中学习的基于字符的单词表示，二是从无注释语料库中学习的非监督单词表示。对于NER（命名实体识别），在使用四种语言且没有任何术语或者地名这样的文本的前提下，我们的模型得到了较好的表现。

1 Introduction

　　...

2 LSTM-CRF Model

　　...

2.1 LSTM

　　...

2.2 CRF Tagging Models

　　...

2.3 Parameterization and Training

　　...

2.4 Tagging Schemes

　　...

3 Transition-Based Chunking Model

　　...

3.1 Chunking Algorithm

　　...

3.2 Represening Labeled Chunks

　　...

4 Input Word Embeddings

　　...

4.1 Character-based models of words

　　...

4.2 Pretrained embeddings

　　...

4.3 Dropout training

　　...

5 Experiments

　　...

5.1 Training

　　...

5.2 Data Sets

　　...

5.3 Results

　　...

5.4 Network Architectures

　　...

6 Related Work

　　...

7 Conclusion

　　...

Acknowledgments

　　...

References

Rie Kubota Ando and Tong Zhang. 2005a. A framework
for learning predictive structures from multiple tasks
and unlabeled data. The Journal of Machine Learning
Research, 6:1817–1853.
Rie Kubota Ando and Tong Zhang. 2005b. Learning
predictive structures. JMLR, 6:1817–1853.
Miguel Ballesteros, Chris Dyer, and Noah A. Smith.
2015. Improved transition-based dependency parsing
by modeling characters instead of words with LSTMs.
In Proceedings of EMNLP.
Miguel Ballesteros, Yoav Golderg, Chris Dyer, and
Noah A. Smith.2016.Training with Explo-
ration Improves a Greedy Stack-LSTM Parser. In
arXiv:1603.03793.
Yoshua Bengio, Patrice Simard, and Paolo Frasconi.
1994. Learning long-term dependencies with gradient
descent is difficult. Neural Networks, IEEE Transac-
tions on, 5(2):157–166.
Chris Biemann, Gerhard Heyer, Uwe Quasthoff, and
Matthias Richter.2007.The leipzig corporacollection-monolingual
corpora of standard size. Pro-
ceedings of Corpus Linguistic.
Chris Callison-Burch, Philipp Koehn, Christof Monz,
Kay Peterson, Mark Przybocki, and Omar F Zaidan.
2010. Findings of the 2010 joint workshop on sta-
tistical machine translation and metrics for machine
translation. In Proceedings of the Joint Fifth Workshop
on Statistical Machine Translation and MetricsMATR,
pages 17–53. Association for Computational Linguis-
tics.
Xavier Carreras, Lluı́s Màrquez, and Lluı́s Padró. 2002.
Named entity extraction using adaboost, proceedings
of the 6th conference on natural language learning.
August, 31:1–4.
Jason PC Chiu and Eric Nichols. 2015. Named en-
tity recognition with bidirectional lstm-cnns. arXiv
preprint arXiv:1511.08308.
Ronan Collobert, Jason Weston, Léon Bottou, Michael
Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011.
Natural language processing (almost) from scratch.
The Journal of Machine Learning Research, 12:2493–
2537.
Silviu Cucerzan and David Yarowsky. 1999. Language
independent named entity recognition combining mor-
phological and contextual evidence. In Proceedings of
the 1999 Joint SIGDAT Conference on EMNLP and
VLC, pages 90–99.
Silviu Cucerzan and David Yarowsky. 2002. Language
independent ner using a unified model of internal and
contextual evidence. In proceedings of the 6th confer-
ence on Natural language learning-Volume 20, pages
1–4. Association for Computational Linguistics.
Hong-Jie Dai, Po-Ting Lai, Yung-Chun Chang, and
Richard Tzong-Han Tsai. 2015. Enhancing of chem-
ical compound and drug name recognition using rep-
resentative tag scheme and fine-grained tokenization.
Journal of cheminformatics, 7(Suppl 1):S14.
Chris Dyer, Miguel Ballesteros, Wang Ling, Austin
Matthews, and Noah A. Smith. 2015. Transition-
based dependency parsing with stack long short-term
memory. In Proc. ACL.
Jacob Eisenstein, Tae Yano, William W Cohen, Noah A
Smith, and Eric P Xing. 2011. Structured databases
of named entities from bayesian nonparametrics. In
Proceedings of the First Workshop on Unsupervised
Learning in NLP, pages 2–12. Association for Com-
putational Linguistics.
Radu Florian, Abe Ittycheriah, Hongyan Jing, and Tong
Zhang. 2003. Named entity recognition through clas-
sifier combination. In Proceedings of the seventh con-
ference on Natural language learning at HLT-NAACL
2003-Volume 4, pages 168–171. Association for Com-
putational Linguistics.
Dan Gillick, Cliff Brunk, Oriol Vinyals, and Amarnag
Subramanya. 2015. Multilingual language processing
from bytes. arXiv preprint arXiv:1512.00103.
David Graff. 2011. Spanish gigaword third edition
(ldc2011t12). Linguistic Data Consortium, Univer-
sity of Pennsylvania, Philadelphia, PA.
Alex Graves and Jürgen Schmidhuber. 2005. Framewise
phoneme classification with bidirectional LSTM net-
works. In Proc. IJCNN.
Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky,
Ilya Sutskever, and Ruslan R Salakhutdinov. 2012.
Improving neural networks by preventing co-
adaptation of feature detectors.
arXiv preprint
arXiv:1207.0580.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
short-term memory. Neural Computation, 9(8):1735–
1780.
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino,
Hagen Fürstenau, Manfred Pinkal, Marc Spaniol,
Bilyana Taneva, Stefan Thater, and Gerhard Weikum.
2011. Robust disambiguation of named entities in text.
In Proceedings of the Conference on Empirical Meth-
ods in Natural Language Processing, pages 782–792.
Association for Computational Linguistics.
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidi-
rectional LSTM-CRF models for sequence tagging.
CoRR, abs/1508.01991.
Yoon Kim, Yacine Jernite, David Sontag, and Alexan-
der M. Rush. 2015. Character-aware neural language
models. CoRR, abs/1508.06615.
Diederik Kingma and Jimmy Ba. 2014. Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
John Lafferty, Andrew McCallum, and Fernando CN
Pereira. 2001. Conditional random fields: Probabilis-
tic models for segmenting and labeling sequence data.
In Proc. ICML.
Dekang Lin and Xiaoyun Wu. 2009. Phrase clustering
for discriminative learning. In Proceedings of the Joint
Conference of the 47th Annual Meeting of the ACL and
the 4th International Joint Conference on Natural Lan-
guage Processing of the AFNLP: Volume 2-Volume 2,
pages 1030–1038. Association for Computational Lin-
guistics.
Wang Ling, Lin Chu-Cheng, Yulia Tsvetkov, Silvio Amir,
Rámon Fernandez Astudillo, Chris Dyer, Alan W
Black, and Isabel Trancoso. 2015a. Not all contexts
are created equal: Better word representations with
variable attention. In Proc. EMNLP.
Wang Ling, Tiago Luı́s, Luı́s Marujo, Ramón Fernandez
Astudillo, Silvio Amir, Chris Dyer, Alan W Black, and
Isabel Trancoso. 2015b. Finding function in form:
Compositional character models for open vocabulary
word representation. In Proceedings of the Conference
on Empirical Methods in Natural Language Process-
ing (EMNLP).
Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing
Nie. 2015. Joint named entity recognition and disam-
biguation. In Proc. EMNLP.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Dean. 2013a. Efficient estimation of word representa-
tions in vector space. arXiv preprint arXiv:1301.3781.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
rado, and Jeff Dean. 2013b. Distributed representa-
tions of words and phrases and their compositionality.
In Proc. NIPS.
Joakim Nivre. 2004. Incrementality in deterministic de-
pendency parsing. In Proceedings of the Workshop on
Incremental Parsing: Bringing Engineering and Cog-
nition Together.
Joel Nothman, Nicky Ringland, Will Radford, Tara Mur-
phy, and James R Curran. 2013. Learning multilin-
gual named entity recognition from wikipedia. Artifi-
cial Intelligence, 194:151–175.
Robert Parker, David Graff, Junbo Kong, Ke Chen, and
Kazuaki Maeda. 2009. English gigaword fourth
edition (ldc2009t13). Linguistic Data Consortium,
Univer-sity of Pennsylvania, Philadelphia, PA.
Alexandre Passos, Vineet Kumar, and Andrew Mc-
Callum. 2014. Lexicon infused phrase embed-
dings for named entity resolution. arXiv preprint
arXiv:1404.5367.
Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray
Kavukcuoglu, and Jason Weston. 2009. Combining
labeled and unlabeled data with word-class distribu-
tion learning. In Proceedings of the 18th ACM con-
ference on Information and knowledge management,
pages 1737–1740. ACM.
Lev Ratinov and Dan Roth. 2009. Design challenges
and misconceptions in named entity recognition. In
Proceedings of the Thirteenth Conference on Compu-
tational Natural Language Learning, pages 147–155.
Association for Computational Linguistics.
Cicero Nogueira dos Santos and Victor Guimarães. 2015.
Boosting named entity recognition with neural charac-
ter embeddings. arXiv preprint arXiv:1505.05008.
Erik F. Tjong Kim Sang and Fien De Meulder. 2003. In-
troduction to the conll-2003 shared task: Language-
independent named entity recognition.
In Proc.
CoNLL.
Erik F. Tjong Kim Sang. 2002. Introduction to the conll-
2002 shared task: Language-independent named entity
recognition. In Proc. CoNLL.
Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.
Word representations: A simple and general method
for semi-supervised learning. In Proc. ACL.
Matthew D Zeiler. 2012. Adadelta: An adaptive learning
rate method. arXiv preprint arXiv:1212.5701.
Yue Zhang and Stephen Clark. 2011. Syntactic process-
ing using the generalized perceptron and beam search.
Computational Linguistics, 37(1).
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015.
Character-level convolutional networks for text classi-
fication. In Advances in Neural Information Process-
ing Systems, pages 649–657.
Jie Zhou and Wei Xu. 2015. End-to-end learning of se-
mantic role labeling using recurrent neural networks.
In Proceedings of the Annual Meeting of the Associa-
tion for Computational Linguistics.

（翻译）Neural Architectures for Named Entity Recognition

猜你喜欢