Dec 3rd 2021: [en] What's new in the Elasticsearch Python client 8.0.0

sethmlarson · December 3, 2021, 8:00am

There's an 8.0.0 pre-release out for the Elasticsearch client, fresh off the press! You can give it a try yourself or skip to the "What's new" section to read about all the new features coming in 8.0.0.

Getting started

Start by installing the pre-release with pip with the --pre flag to opt-in to pre-releases:

$ python -m pip install --pre 'elasticsearch>=8'

Then you can start an Elasticsearch 8.0.0 preview cluster and connect. Note that by default security will be enabled meaning you need to capture the password of the elastic user. If your cluster generated a certificate bundle you should copy the http_ca.crt file and use ca_certs to specify its location.

from elasticsearch import Elasticsearch

client = Elasticsearch(
  "https://localhost:9200",
  basic_auth=("elastic", "<password>"),
  ca_certs="/path/to/http_ca.cert"
)

If you're unable to get this working and want to do a quick test you can use verify_certs=False, but don't use this setting for a production cluster!

Find an issue? Report it to us!

At the time of writing the released version is only an 8.0.0 pre-release, that means there's time for us to fix bugs! If you happen to find something please open an issue on our repository.

What's new in the 8.0.0 pre-release?

Now that you've setup your cluster and client, what sorts of features are new in the pre-release?

Richer responses

In previous versions of the client, responses were only raw deserialized JSON. Because objects like dict and list were used this meant you couldn't access fine-grained details about the transport layer like HTTP status, headers, and more. In 8.0.0 you can access resp.meta for lots of details about the response:

resp = client.info()

>>> resp.meta.status  # HTTP status code
200
>>> resp.meta.headers  # HTTP headers
{'Content-Encoding': 'gzip',
 'Content-Length': '348',
 'Content-Type': 'application/json', ...}
>>> resp.meta.http_version  # HTTP version
'1.1'
>>> resp.meta.node  # Which node serviced the request?
NodeConfig(scheme='https', host='6d1f...a5af.us-west2.gcp.elastic-cloud.com', port=443, ...)

More fields and data will be added to this meta object in the future as they become available to the transport library.

Because responses are no longer simple dict instances if you really need access to the underlying dict you can use the .raw property. Keep in mind that typical access patterns like indexing, calling methods, and special Python functions like len, iter, etc will all work identically:

# The response object isn't a dictionary, it's an object:
>>> resp
ObjectApiResponse({'name': 'instance-0000000000', ..., 'tagline': 'You Know, for Search'})

# You can access the response via indexing like in 7.x:
>>> resp["tagline"]
'You Know, for Search'

# Methods of the underlying type are forwarded similarly:
>>> resp.keys()
dict_keys(['name', 'cluster_name', 'cluster_uuid', 'version', 'tagline'])

# If you need the underlying 'dict' you can use .raw:
>>> resp.raw
{'name': 'instance-0000000000', ..., 'tagline': 'You Know, for Search'}

Body field parameters

In previous versions of the client all information encoded into the HTTP body was provided via an opaque body parameter. This meant in some of the most complex parts of the Elasticsearch API you'd get no help with auto-complete or type hints. Now in 8.0.0 thanks to the Elasticsearch specification project we now have a complete API specification of all parts the Elasticsearch API, including HTTP bodies!

This means we can provide top-level parameters for bodies in addition to path and query parameters. You don't have to figure out where parameters are serialized, just use the parameters and let the client figure that out for you:

client.ingest.put_pipeline(
  id="pipeline-id",
  master_timeout="5m",
  description="My ingest pipeline",
  processors=[
    {"set": {"field": "ingested_at", "value": "{{_ingest.timestamp}}"}}
  ],
  on_failure=[
    {"set": {"field": "_index", "value": "failed-docs"}}
  ]
)

Here's the same API request in 7.x, notice you have to know which parameters go into body and which don't:

client.ingest.put_pipeline(
  id="pipeline-id",
  master_timeout="5m",
  body={
    "description": "My ingest pipeline"
    "processors": [
      {"set": {"field": "ingested_at", "value": "{{_ingest.timestamp}}"}}
    ],
    "on_failure": [
      {"set": {"field": "_index", "value": "failed-docs"}}
    ]
  }
)

Per-request options

To change a parameter for a single request you can modify a small subset of transport options like api_key, request_timeout, opaque_id (and a few more) directly within the API method:

# This usage is deprecated, don't use in v8.0+
client.search(..., api_key=("id", "api-key"))

This worked for most APIs, however there are conflicts with certain APIs that have query parameters with the same name as the transport parameters (like the sql.query API accepting request_timeout). For these situations is the parameter used for the transport or for the API?

We make the distinction clear with the new Elasticsearch.options() method which can be used to build new client instances using the existing transport instance with slightly different options:

client.options(api_key=("id", "api-key")).search(...)

Because a new client instance is returned from the options() method you can also build contextualized clients that have different authentication and use them over and over instead of having to pass the authentication into every API call:

auth_client = not_auth_client.options(basic_auth=("username", "password"))

# This `auth_client` instance has the username:password privileges
resp = auth_client.indices.create(...)

Explicit client configuration

Previously when instantiating an Elasticsearch client instance the default URL would be scheme="http", host="localhost", and port=9200, however the method that resolved each parameter was very complicated and didn't make a lot of sense in many situations (should a scheme of https give a default port of 443, 9200, 9243, or 9443? Elastic Cloud used to use one of these ports).

There can be even more confusion with 8.0.0 enabling HTTPS by default (which is a huge win for user security!) This means the 7.x default value (http://localhost:9200) doesn't work with a default 8.0 Elasticsearch server instance, instead should be https://localhost:9200.

Because of this starting in 8.0.0 the client must be explicitly configured with either a URL including a scheme and port (like https://localhost:9200) or via an Elastic Cloud ID (cloud_id="deployment:..."). This change ensures we're never using bad or confusing defaults.

Keyword-only arguments

In v7.15.0 of the client using positional arguments would raise a DeprecationWarning. Starting in 8.0.0 all parameters on API methods are keyword-only and passing positional arguments to APIs will raise a TypeError.

# Keyword arguments only!
client.search(index="target-index")

# This will raise a TypeError:
client.search("target-index")

Requiring keyword arguments makes your code resilient to additional API parameters being added or changed and makes your code more readable: Double win!

system · December 31, 2021, 8:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 8.1 - python security client issues Elasticsearch elastic-stack-security , language-clients	3	1417	May 10, 2022
Dec 13th, 2023: [EN] 10 reasons to upgrade to elasticsearch-py 8.x Advent Calendar	1	829	January 10, 2024
Dec 17th, 2021: [EN] Getting Started with the Elasticsearch Python Client Advent Calendar	1	1543	January 14, 2022
Dec 9th, 2021: [en] Getting started with security enabled by default in three easy steps Advent Calendar	1	1350	January 6, 2022
Python PKI authentication Elasticsearch	2	2473	August 15, 2017