Dec 13th, 2023: [EN] 10 reasons to upgrade to elasticsearch-py 8.x

Cet article est disponible en français.

Back in February 2022, when Elasticsearch 8.0 was released, the Python client also got an 8.0 release. It was a partial rewrite of the 7.x client, and came with a bunch of nice features (outlined below) but also deprecation warnings and breaking changes. Today, version 7.17 of the client is still relatively popular with more than one monthly million downloads which represents ~50% of the 8.x downloads.

As the new maintainer for the Elasticsearch Python client, I want our community to benefit from the improvements we're putting in the client, by helping all elasticsearch-py users:

  • helping 7.17 users to migrate to 8.x,
  • helping 8.x users to take advantage of new features.

I know from my experience as an urllib3 maintainer, that investing in a urllib3 2.0 migration guide and helping users to migrate pays off. In a similar vein, we're now making efforts to make elasticsearch-py 8.x easier to use, by removing deprecation warnings and breaking changes. This post highlights the good reasons to upgrade to 8.x: the reasons that were true for nearly two years now but also the recent ones. By the way, in case you're already convinced, check out our migration guide (and make sure to reach out if you're stuck!):

Without further ado, here are the ten reasons to use elasticsearch-py 8.x.

1. Support for the latest Elasticsearch APIs

Elasticsearch 8 is the best version of Elasticsearch, with many improvements across the board and vastly increased scale. More importantly for our discussion of the Python client, various new APIs have been added:

Since the client is generated from the Elasticsearch specification, it is guaranteed to get the latest and greatest APIs.

2. Elasticsearch DSL client 8.x

The Elasticsearch DSL client is a high-level library whose aim is to help with writing and running queries against Elasticsearch in a more concise way. Using the same christmas_characters index as the Dec 6th Advent Calendar post as an example:

response = client.search(
    index="christmas_characters",
    query={
        "bool": {
            "must": [{"match": {"behavior": "good"}}]
        }
    }
)

for hit in response['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

becomes:

s = Search(using=client, index="christmas_characters").query("match", behavior="good")

for hit in s.execute():
    print(hit.meta.score, hit.title)

You will either love its conciseness or hate to have to learn a new DSL, but this client is highly popular, with more than 3 million downloads every month and a dedicated user base. It used to be poorly maintained and stuck at version 7.4.1, but I released version 8.9.0 in September, the first that is compatible with version 8 of the main Python client.

I am dedicated to maintain it going forward, and released 8.11.0 last month to support Python 3.12 and allow collapsing queries (the relevant GitHub issue had accumulated 38 votes!).

3 Type hints and more pythonic API

Since the very early days of the Elasticsearch Python client (back in July 2013!), the body parameter is the way to specify the request body for requests that accept it. API calls using body look like this:

es.search(
    index="christmas_characters",
    body={
        "query": {"match_all": {}},
        "size": 50,
    }
)

However, this parameter is an untyped Python dictionary which is not validated by the client, which means that you can't tell if your request is correct before sending it to the server. But you don't want to find out about basic problems in production! As a result, elasticsearch-py 8.0 introduced a better API by taking advantage of the Elasticsearch specification which provides the full types of each API. The first level of body keys can be specified using Python parameters:

es.search(
    index="christmas_characters",
    query={"match_all": {}},
    size=50,
)

This has various advantages, including better auto completion and type checks. For example, mypy will raise an error if size is not an integer. And since we realized we could unpack body to typed parameters like this:

es.search(
    index="christmas_characters",
    **{"query": {"match_all": {}}, "size": 50}
)

We decided to deprecate the body parameter altogether in elasticsearch-py 8.0.

4. Bringing back the body parameter

However, deprecating body had the following downsides:

  • A lot of code written in the past decade was now triggering a deprecation warning
  • Unknown parameters such as sub_searches or unintentional omissions from the Elasticsearch specification were rejected, causing queries to outright fail, unnecessarily forcing the use of raw requests.
  • Optimizations such as passing an already encoded body to avoid paying the cost of serializing JSON were no longer possible.

The original author of the client, Honza Král, pointed out those issues, and we decided to allow body to work as before, without any warnings, alongside the new API. This will be available in elasticsearch-py 8.12, and we're hopeful that this will help with the adoption of elasticsearch-py 8.x.

5. Logging requests for debugging

The elasticsearch-py 8.x client is based on the elastic-transport library, which can be used as a base for different clients. This library introduced a very useful feature to debug requests and response, enabled by calling elastic_transport.debug_logging().

import elastic_transport
from elasticsearch import Elasticsearch

# In this example we're debugging an Elasticsearch client:
client = Elasticsearch(...)
# Use `elastic_transport.debug_logging()` before the request
elastic_transport.debug_logging()

client.search(
    index="christmas_characters",
    query={
        "bool": {
            "must": [{"match": {"behavior": "good"}}]
        }
    }
)

The above script will output the following logs:

[2021-11-23T14:11:20] > POST /example-index/_search?typed_keys=true HTTP/1.1
> Accept: application/json
> Accept-Encoding: gzip
> Authorization: Basic <hidden>
> Connection: keep-alive
> Content-Encoding: gzip
> Content-Type: application/json
> User-Agent: elastic-transport-python/8.11.0+dev
> X-Elastic-Client-Meta: es=8.11.0p,py=3.12.0,t=8.11.0p,ur=2.1.0
> {"query":{"match":{"text-field":"value"}}}
< HTTP/1.1 200 OK
< Content-Encoding: gzip
< Content-Length: 165
< Content-Type: application/json;charset=utf-8
< Date: Tue, 12 Dec 2022 20:11:20 GMT
< X-Cloud-Request-Id: ctSE59hPSCugrCPM4A2GUQ
< X-Elastic-Product: Elasticsearch
< X-Found-Handling-Cluster: 40c9b5837c8f4dd083f05eac950fd50c
< X-Found-Handling-Instance: instance-0000000001
< {"hits":{...}}

This feature became an essential part of my development workflow the day I learned about it and I now miss it in other clients: it's that good!

6. Full chain SSL/TLS fingerprint pinning

When communicating with Elasticsearch over HTTPS, which is the default starting with Elasticsearch 8, the client needs to be able to verify the certificate the server is using, just like your browser had to verify the certificate of discuss.elastic.co before fetching the words you're currently reading. This works by following the certificate chain up to the root certificate authority (root CA). However, that root CA isn't necessarily a common one that is already trusted, but could be a corporate root CA used across your company or even a root CA generated by Elasticsearch for a single cluster.

In those cases, there are two ways to verify the certificates correctly:

  1. Store the relevant certificate authority in a file, and configure the ca_certs parameter. Storing this file requires an extra step, however, and getting access to the public certificate isn't always easy.
  2. Specify the SSL fingerprint of each node in the cluster, to ensure each node certificate never changes. However, you need to do this for each node, which is impossible in practice for larger clusters.

Thankfully, as part of his work on Python trust stores, past Elasticsearch Python client maintainer Seth Larson realized that, using Python 3.10+ private API, it's possible to pin the fingerprint of the root CA, allowing verification of certificates of all nodes, bringing the best of both worlds. See Configuration | Elasticsearch Python Client [8.11] | Elastic for all the options around TLS in the Python client.

7. options() API

In elasticsearch-py 7.x, per-request options like api_key and ignore were allowed within client API methods. However, that was confusing as it was mixing transport-level parameters and API-level parameters. This is now deprecated, as elasticsearch-py 8.x introduced the options() API, transforming:

client.search(index="christmas_characters", request_timeout=10)

to:

client.options(request_timeout=10).search(index="christmas_characters")

See the migration guide for details.

8. Improved documentation

A major current focus is to improve the documentation of the Python client.

The API reference was split by namespace to reduce confusion between similarly-named APIs like es.exists and es.indices.exists.

It also includes inline type hints:

Finally, Quickstart and Interactive examples pages were added.

9. Serverless

Elastic's latest offering, Serverless, has a dedicated Elasticsearch Python client, elasticsearch-serverless-python, which contains just the APIs and options supported by Serverless.

That said, the default Python client, elasticsearch-py, also supports Serverless, which makes it easy to try out Serverless with your existing code!

10. Generative AI

Elastic invests a lot in Generative AI, and Elasticsearch is the most downloaded vector database! The best way to get started is Elastic Search Labs. It contains blog posts and Python notebooks for every use case with elasticsearch-py 8.x.


That's it! Thanks for reading. When you're ready to upgrade, the migration guide is the best place to start:

If you're stuck, make sure to ask questions right here on Discuss, using the language-clients and python tags. If you believe you found a bug, please open an issue on GitHub instead. Thanks!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.