Simplifying the data access with iterators

A simplified way to iterate over articles and events that are results of a search query

This blog post describes the recent update of the Python library that can be used to access the data in the Event Registry. The update significantly simplifies the way in which you can iterate through the articles or events that are a result of a search query.

We will illustrate the change in an example and describe the old and the new way of iterating over the results. We will assume that you’d like to find news articles about “George Clooney”.

Old approach

Assuming that there are hundreds of news articles about Clooney, the code that would download all articles about him would look something like this:

from eventregistry import *
er = EventRegistry()
q = QueryArticles(conceptUri = er.getConceptUri("George Clooney"))
while True:
    q.setRequestedResult(RequestArticlesInfo(page = page))
    res = er.execQuery(q)
    for article in res["articles"]["results"]:
        print article  # do something with the article here
    if page >= res["articles"]["pages"]:
    page += 1

With QueryArticles()we say that we want a search over the news articles (and not events) and with the conceptUri parameter we specify what to search for. q.setRequestedResult specifies in what form do we want to obtain the results of the query – in our case, this is RequestArticlesInfo() which is simply the list of matching articles (different options are described in the documentation page). The code becomes tedious because we have to query results per page like we’re used to when we Google things. For that reason, we create a loop and we exit it once we’ve gone through all the available pages of the results.

New approach

Now let’s have a look at how we can do this more nicely, without the manual iteration through the pages of results. We’ve added some new classes that can be used as iterators, namely QueryArticlesIter, QueryEventsIter and QueryEventArticlesIter.

The top example can now simply be written as follows:

from eventregistry import *
er = EventRegistry()
q = QueryArticlesIter(conceptUri = er.getConceptUri("George Clooney"))
for art in q.execQuery(er):
    print art    # do something with the article here

We have replaced the QueryArticles with the QueryArticlesIter class. By calling execQuery on it, we make the search for articles and return an iterator. Since the iterator has to automatically query different pages of results it also expects the EventRegistry class instance as the argument.

When calling the execQuery() method you can, of course, also specify the order in which you would like to obtain the articles as well as the details of the articles to be returned. All details about the class and the parameters are available in the documentation.

Other Iterators

In the above example, we have only described the QueryArticlesIter that can be used to iterate over the results of a search for articles. As mentioned, we have added two more iterators.

The QueryEventsIter is an iterator class that can be used to iterate over the events that are results of an event search. It can be used as a replacement for the QueryEvents class and its details with an example are described here.

Lastly, the QueryEventArticlesIter class enables one to simply iterate over the list of articles that are associated with a particular event. Again, the details and an example are described here.


No comments yet. Be the first to leave a comment!

Leave A Comment

Comments support plain text only.

Your email address will not be published.