论文《A Generative Entity-Mention Model for Linking Entities with Knowledge Base》

A Generative Entity-Mention Model for Linking Entities with Knowledge Base

主要方法：

In our model, each name mention to be linked is modeled as a sample generated through a three-step generative story, and the entity knowledge is encoded in the distribution of entities in document P(e), the distribution of possible names of a specific entity P(s|e), and the distribution of possible contexts of a specific entity P(c|e). To find the referent entity of a name mention, our method combines the evidences from all the three distributions P(e), P(s|e) and P(c|e).

The P(e), P(s|e) and P(c|e) are respectively called the entity popularity model, the entity name model and the entity context model

The Generative Entity-Mention Model for Entity Linking

　　the probability of a name mention m (its context is c and its name is s) referring to a specific entity e can be expressed as the following formula (here we assume that s and c are independent given e):

Given a name mention m, to perform entity linking, we need to find the entity e which maximizes the probability P(e|m). Then we can resolve the entity linking task as follows:

Candidate Selection.：

　　　　building a name-to-entity dictionary using the redirect links, disambiguation pages, anchor texts of Wikipedia, then the candidate entities of a name mention are selected by finding its name’s corresponding 　　entry in the dictionary

Model Estimation

Entity Popularity Model

　　　　where Count(e) is the count of the name mentions whose referent entity is e, and the |M| is the total name mention size.

　　2.Entity Name Model

　　　　　　where e is a normalization factor, f is the full name of entity e, lf is the length of f, ls is the length of the name s, si the i th word of s, fj is the j th word of f and t(si|fj) is the lexical translation probability which 　　　　indicates the probability of a word fj in the full name will be written as si in the output name.

　　3.Entity Context Model

　　　　given a context c containing n terms t1t2…tn, the entity context model estimates the probability P(c|e) as

　　where Pg(t) is a general language model which is estimated using the whole Wikipedia data, and the optimal value of λ is set to 0.2

　　　　where Counte(t) is the frequency of occurrences of a term t in the contexts of the name mentions whose referent entity is e

4. The NIL Entity Problem

the referent entity may not be contained in the given knowledge base

first add a pseudo entity, the NIL entity, into the knowledge base

if the probability of a name mention is generated by the NIL entity is higher than all other entities in Knowledge base, we link the name mention to the NIL entity.

论文《A Generative Entity-Mention Model for Linking Entities with Knowledge Base》

猜你喜欢