How to Query Elasticsearch with Python


(sandralynn) #1

Elasticsearch is an open-source distributed search server built on top of Apache Lucene. It’s a great tool that allows to quickly build applications with full-text search capabilities. The core implementation is in Java, but it provides a nice REST interface which allows to interact with Elasticsearch from any programming language.

This article provides an overview on how to query Elasticsearch from Python. There are two main options:

Implement the REST-API calls to Elasticsearch
Use one of the Python libraries that does the above for you

Quick Intro on Elasticsearch

Elasticsearch is developed in Java on top of Lucene, but the format for configuring the index and querying the server is JSON. Once the server is running, by default it’s accessible at localhost:9200 and we can start sending our commands via e.g. curl:

curl -XPOST http://localhost:9200/test/articles/1 -d '{
"content": "The quick brown fox"
}'
This commands creates a new document, and since the index didn’t exist, it also creates the index. Specifically, the format for the URL is:

http://hostname:port/index_name/doc_type/doc_id
so we have just created an index “test” which contains documents of type “articles”. The document has only one field, “content”. Since we didn’t specify, the content is indexed using the default Lucene analyzer (which is usually a good choice for standard English). The document id is optional and if we don’t explicitly give one, the server will create a random hash-like one.

We can insert a few more documents, see for example the file create_index.sh from the code snippets on github.

Once the documents are indexed, we can perform a simple search, e.g.:

curl -XPOST http://localhost:9200/test/articles/_search?pretty=true -d '{
"query": {
"match": {
"content": "dog"
}
}
}'
Using the sample documents above, this query should return only one document. Performing the same query over the term “fox” rather than “dog” should give instead four documents, ranked according to their relevance.

How the Elasticsearch/Lucene ranking function works, and all the countless configuration options for Elasticsearch, are not the focus of this article, so bear with me if we’re not digging into the details. For the moment, we’ll just focus on how to integrate/query Elasticsearch from our Python application.

Querying Elasticsearch via REST in Python

One of the option for querying Elasticsearch from Python is to create the REST calls for the search API and process the results afterwards. The requests library is particularly easy to use for this purpose. We can install it with:

sudo pip install requests

The sample query used in the previous section can be easily embedded in a function:

1
2
3
4
5
6
7
8
9
10
11
12
def search(uri, term):
"""Simple Elasticsearch Query"""
query = json.dumps({
"query": {
"match": {
"content": term
}
}
})
response = requests.get(uri, data=query)
results = json.loads(response.text)
return results
The “results” variable will be a dictionary loaded from the JSON response. We can pretty-print the JSON, to observe the full output and understand all the information it provides, but again this is beyond the scope of this post. So we can simply print the results nicely, one document per line, as follows:

1
2
3
4
5
6
7
def format_results(results):
"""Print results nicely:
doc_id) content
"""
data = [doc for doc in results['hits']['hits']]
for doc in data:
print("%s) %s" % (doc['_id'], doc['_source']['content']))
Similarly, we can create new documents:

1
2
3
4
5
def create_doc(uri, doc_data={}):
"""Create new document."""
query = json.dumps(doc_data)
response = requests.post(uri, data=query)
print(response)
with the doc_data variable being a (Python) dictionary which resembles the structure of the document we’re creating.

You can see a full working toy example in the rest.py file in the Gist on github.

Querying Elasticsearch Using elasticsearch-py

The requests library is fairly easy to use, but there are several options in terms of libraries that abstract away the concepts related to the REST API and focus on Elasticsearch concepts. In particular, the official Python extension for Elasticsearch, called elasticsearch-py, can be installed with:

sudo pip install elasticsearch
It’s fairly low-level compared to other client libraries with similar capabilities, but it provides a consistent and easy to extend API.

We can replicate the search used with the requests library, as well as the result print-out, just using a few lines of Python:

1
2
3
4
5
6
7
from elasticsearch import Elasticsearch

es = Elasticsearch()
res = es.search(index="test", doc_type="articles", body={"query": {"match": {"content": "fox"}}})
print("%d documents found" % res['hits']['total'])
for doc in res['hits']['hits']:
print("%s) %s" % (doc['_id'], doc['_source']['content']))
In a similar fashion, we can re-create the functionality of adding an extra document:

1
es.create(index="test", doc_type="articles", body={"content": "One more fox"})
The full functionality of this client library are well described in the documentation.

Source: macro bonzanini


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.