IdentityRank algorithm: Homepage
What is IdentityRank?
Download
Publications
Contact
What is IdentityRank?

The news business is based on producing news items about current events and delivering them to customers. Customers want to receive information about events as soon as they occur. Customers do not want to be bothered with useless information, that is, they want to get information only about events of interest.

The NEWS EU IST project aims at providing solutions which help news agencies to overcome limitations in their current workflows and increase their productiveness and revenues. In order to reach this aim, the NEWS project makes use of state-of-the-art Semantic Web technologies.

In order to apply Semantic Web technologies to the news domain, in the NEWS project a set of components were developed. One of them is the NEWS ontology, a lightweight RDFS ontology providing a formal model of the domain. Another one is an annotation component, which uses natural language processing techniques to provide capabilities such as categorization and named entity extraction.

Within the semantic annotation process, one of the key problems that we found in NEWS was the disambiguation of the entities detected by the natural language processing engine. This engine extracts named entities out of the news items, but, in order to allow a fine-grained semantic search for the user of the NEWS system, these entities have to be matched against instances of the NEWS ontology. That is, the natural language processing engine can detect that a certain occurrence of the piece of text Bush represents a person, but we also need to deduce that this person is represented in the NEWS ontology by a certain URI.

In order to deal with this problem, the NEWS consortium has developed the IdentityRank algorithm. Basically this algorithm exploits all the information provided by the natural language processing engine (categories, entities) and the news item timestamp as context for entity disambiguation. It is based on two principles:

Related resources
Download IdentityRank IdentityRank source code (v1.0, Java)
Javadoc documentation Javadoc (v1.0)
EU-IST project NEWS NEWS homepage @ UC3M
Publications
NEWS: Bringing Semantic Web Technologies into News Agencies
Fernández-García, N.; Blázquez-del-Toro, J.; Arias Fisteus, J.; Sánchez-Fernández, L.; Sintek, M.; Bernardi, A.; Fuentes, M.; Marrara, A.; Ben-Asher, Z.
In 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9.
Springer link
IdentityRank: Named entity disambiguation in the context of the NEWS project
Fernández-García, N.; Blázquez-del-Toro, J.; Sánchez-Fernández, L.; Bernardi, A.;
In 4th European Semantic Web Conference, ESWC 2007, Innsbrück, Austria, 3-7 June, 2007.
Springer link
Semantic Annotation of Web Resources Using IdentityRank and Wikipedia
Fernández-García, N.; Blázquez-del-Toro, J.; Sánchez-Fernández, L.; Luque, V.
In 5th Atlantic Web Intelligence Conference, AWIC 2007, Fontainebleau, France, June 25-27, 2007.
Springer link
Contact
Contact person Norberto Fernández
Institution Web technologies lab (webTlab), Telematics Engineering Deparment, Universidad Carlos III de Madrid

Last modified:

Location | Teaching staff | Teaching | Research | News | Intranet
Home | Website map | Contact

Valid XHTML 1.1 Valid CSS