Use of semantic annotation in news monitoring explained

The main limitations to manual annotation (tagging) of articles by hand are speed and consistency.

Event Registry’s media monitoring and analysis service uses advanced natural language processing technology to analyze every news article entering its news feed. This involves several steps, one of which is semantic annotation – a process of computer-based recognition of key terms appearing in the articles. Semantic annotation is a way for the system to understand what the article is about, which subsequently enables the sorting and analysis functions that deliver value to the user of our news monitoring service.

Annotation of news articles is not new

Annotation of news articles is by no means a new idea. Virtually all news publishers currently tag their articles with keywords and assign them to specific categories (politics, business, sport, etc.) to allow quick identification of the topic covered and to make it easier to locate the article in the database at a later stage. The example below shows metadata in an article published by the Slovenian Press Agency.

The highlighted section includes manual tags added to the news articles.

For decades media publishers have resorted to manual annotation of articles. The main limitations to tagging articles by hand are speed (it takes additional time to enter the tags) and consistency. Two different authors may assign different keywords to an article discussing the same topic. An article about, for example, a doping scandal, may in addition to being a topic for the sports section also qualify as an article for the health or science sections of a news site.

Semantic annotation means automation…

Semantic annotation automates the process of tagging the articles. Systems using artificial intelligence are taught to recognize concepts appearing in the articles. This involves identifying people, locations, organizations and other pre-determined concepts in the articles. The example below shows a semantically annotated article (You can try on your own text on this demo page).

Annotations of locations (red), people (green), organizations (purple) and other concepts (blue) in an article by the STA.

In the case of Event Registry,  Wikipedia is used as the underlying knowledge base for recognizing concepts which appear in articles. Any concept that appears in the largest free online encyclopedia in the world can be recognized by the system in the news articles that enter the news feed. The semantic annotations can be currently provided for 100 most spoken languages in the world.

Automated systems improve speed and consistency of tagging. Terms appearing in news articles will consistently be understood by the system, eliminating human variance. On the other hand, a key challenge for semantic annotation is disambiguation – recognizing that a term can have different meanings (e.g. Chicago the city or the musical).  Systems attempt to overcome this through understanding of context.

… and enables simpler search and analysis

Once a news article has been annotated, the metadata added to it allows for further analysis. This allows for the creation of summaries and an array of visualizations, generation of events from clusters of similar articles, categorization of articles,  and advanced search functionality. The graph below is a visualization of top concepts identified in events related to footballer Neymar in Event Registry’s news feed for the past month.

Graph of top concepts for Neymar

Semantic annotation is also an increasingly important feature for the world’s news media in their quest to utilize advanced technologies to optimize work processes, improve content delivery and drive additional traffic to their website. For example, semantic annotation can be used to create special thematic pages on news websites, where users can browse new and historical content on the same topic, and for promoting related content.


No comments yet. Be the first to leave a comment!

Leave A Comment

Comments support plain text only.

Your email address will not be published.