A number of changes for API users

Today we released a number of changes and additions that affect users who use our API to search for articles or events. In brief, these changes are:

  • A new query language that provides support for arbitrarily complex queries.
  • Introduced support for phrase keyword search.
  • Normalization of source URIs

New query language

To date users of our Python library were able to search for articles and events using the QueryArticles and QueryEvents classes. Both classes allow users to provide multiple conditions such as concepts, keywords, sources, categories, etc. However, they do not allow for arbitrarily complex queries using AND, OR and NOT operators between several different conditions.

Today we released an updated Python library where, in addition to simple queries, users are able to specify arbitrarily complex queries using a syntax that resembles the one used in MongoDB. The detailed documentation for the query language is described on GitHub’s search articles and search events pages, so here we will provide just two short examples.

The new queries can now be made in two ways — directly using a JSON object or using the new query classes.

A query in JSON that would return the list of articles that are annotated with concept Artificial Intelligence or mention phrases “deep learning” or “machine learning” would look like this:

{ "$query": 
   { "$or": [ 
      { 
         "conceptUri": "http://en.wikipedia.org/wiki/Artificial_Intelligence" 
      }, 
      { 
         "keyword": { 
            $or: [ "deep learning", "machine learning" ] 
         } 
      } 
      ] 
   } 
}

As you can see, the query consists of two conditions inside the $or list. One condition finds the articles about AI, the other finds the articles based on the keywords mentioned in them. Since we are specifying two keywords, we have to explicitly determine which operator should be used (in this case OR).

As illustrated already in the previous JSON query, you can now specify a phrase to use in the search (for example “deep learning”). In the JSON query, you can specify it by simply putting the whole phrase inside quotation marks.

If you are using the QueryArticles() or QueryEvents() classes and specifying keywords, there is now a change in how your input is treated. Previously, specifying:

q = QueryArticles(keywords = "deep learning")

would return articles that mentioned both words, but not necessarily together. Now, with the phrase search, the results will only include articles where “deep learning” appears as a phrase.

Normalization of URIs for news sources

This last change only affects users that explicitly specify URIs for news sources in their queries, such as

q = QueryArticles(sourceUri = "www.euronews.com")

Until now, some sources had the “www.” prefix in their URI and some did not– depending on the URLs of the articles we obtained from the news source. Starting today, we have normalized all the source URIs to strip the “www.” prefix if the source includes one.

In case you have a static script that has a query like the one above, you now need to change it to:

q = QueryArticles(sourceUri = "euronews.com")

The reason for this normalization is that some sources provided articles in both forms — sometimes with the “www.” prefix and sometimes without. In these cases, the two sources would appear as different (and using it in a query would only return results from one form), even if they were the same.