There's an 8.0.0 pre-release out for the Elasticsearch client, fresh off the press! You can give it a try yourself or skip to the "What's new" section to read about all the new features coming in 8.0.0.
Getting started
Start by installing the pre-release with pip
with the --pre
flag to opt-in to pre-releases:
$ python -m pip install --pre 'elasticsearch>=8'
Then you can start an Elasticsearch 8.0.0 preview cluster and connect. Note that by default security will be enabled meaning you need to capture the password of the elastic
user. If your cluster generated a certificate bundle you should copy the http_ca.crt
file and use ca_certs
to specify its location.
from elasticsearch import Elasticsearch
client = Elasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "<password>"),
ca_certs="/path/to/http_ca.cert"
)
If you're unable to get this working and want to do a quick test you can use verify_certs=False
, but don't use this setting for a production cluster!
Find an issue? Report it to us!
At the time of writing the released version is only an 8.0.0 pre-release, that means there's time for us to fix bugs! If you happen to find something please open an issue on our repository.
What's new in the 8.0.0 pre-release?
Now that you've setup your cluster and client, what sorts of features are new in the pre-release?
Richer responses
In previous versions of the client, responses were only raw deserialized JSON. Because objects like dict
and list
were used this meant you couldn't access fine-grained details about the transport layer like HTTP status, headers, and more. In 8.0.0 you can access resp.meta
for lots of details about the response:
resp = client.info()
>>> resp.meta.status # HTTP status code
200
>>> resp.meta.headers # HTTP headers
{'Content-Encoding': 'gzip',
'Content-Length': '348',
'Content-Type': 'application/json', ...}
>>> resp.meta.http_version # HTTP version
'1.1'
>>> resp.meta.node # Which node serviced the request?
NodeConfig(scheme='https', host='6d1f...a5af.us-west2.gcp.elastic-cloud.com', port=443, ...)
More fields and data will be added to this meta
object in the future as they become available to the transport library.
Because responses are no longer simple dict
instances if you really need access to the underlying dict
you can use the .raw
property. Keep in mind that typical access patterns like indexing, calling methods, and special Python functions like len
, iter
, etc will all work identically:
# The response object isn't a dictionary, it's an object:
>>> resp
ObjectApiResponse({'name': 'instance-0000000000', ..., 'tagline': 'You Know, for Search'})
# You can access the response via indexing like in 7.x:
>>> resp["tagline"]
'You Know, for Search'
# Methods of the underlying type are forwarded similarly:
>>> resp.keys()
dict_keys(['name', 'cluster_name', 'cluster_uuid', 'version', 'tagline'])
# If you need the underlying 'dict' you can use .raw:
>>> resp.raw
{'name': 'instance-0000000000', ..., 'tagline': 'You Know, for Search'}
Body field parameters
In previous versions of the client all information encoded into the HTTP body was provided via an opaque body
parameter. This meant in some of the most complex parts of the Elasticsearch API you'd get no help with auto-complete or type hints. Now in 8.0.0 thanks to the Elasticsearch specification project we now have a complete API specification of all parts the Elasticsearch API, including HTTP bodies!
This means we can provide top-level parameters for bodies in addition to path and query parameters. You don't have to figure out where parameters are serialized, just use the parameters and let the client figure that out for you:
client.ingest.put_pipeline(
id="pipeline-id",
master_timeout="5m",
description="My ingest pipeline",
processors=[
{"set": {"field": "ingested_at", "value": "{{_ingest.timestamp}}"}}
],
on_failure=[
{"set": {"field": "_index", "value": "failed-docs"}}
]
)
Here's the same API request in 7.x, notice you have to know which parameters go into body
and which don't:
client.ingest.put_pipeline(
id="pipeline-id",
master_timeout="5m",
body={
"description": "My ingest pipeline"
"processors": [
{"set": {"field": "ingested_at", "value": "{{_ingest.timestamp}}"}}
],
"on_failure": [
{"set": {"field": "_index", "value": "failed-docs"}}
]
}
)
Per-request options
To change a parameter for a single request you can modify a small subset of transport options like api_key
, request_timeout
, opaque_id
(and a few more) directly within the API method:
# This usage is deprecated, don't use in v8.0+
client.search(..., api_key=("id", "api-key"))
This worked for most APIs, however there are conflicts with certain APIs that have query parameters with the same name as the transport parameters (like the sql.query
API accepting request_timeout
). For these situations is the parameter used for the transport or for the API?
We make the distinction clear with the new Elasticsearch.options()
method which can be used to build new client instances using the existing transport instance with slightly different options:
client.options(api_key=("id", "api-key")).search(...)
Because a new client instance is returned from the options()
method you can also build contextualized clients that have different authentication and use them over and over instead of having to pass the authentication into every API call:
auth_client = not_auth_client.options(basic_auth=("username", "password"))
# This `auth_client` instance has the username:password privileges
resp = auth_client.indices.create(...)
Explicit client configuration
Previously when instantiating an Elasticsearch
client instance the default URL would be scheme="http"
, host="localhost"
, and port=9200
, however the method that resolved each parameter was very complicated and didn't make a lot of sense in many situations (should a scheme of https
give a default port of 443
, 9200
, 9243
, or 9443
? Elastic Cloud used to use one of these ports).
There can be even more confusion with 8.0.0 enabling HTTPS by default (which is a huge win for user security!) This means the 7.x default value (http://localhost:9200
) doesn't work with a default 8.0 Elasticsearch server instance, instead should be https://localhost:9200
.
Because of this starting in 8.0.0 the client must be explicitly configured with either a URL including a scheme and port (like https://localhost:9200
) or via an Elastic Cloud ID (cloud_id="deployment:..."
). This change ensures we're never using bad or confusing defaults.
Keyword-only arguments
In v7.15.0 of the client using positional arguments would raise a DeprecationWarning
. Starting in 8.0.0 all parameters on API methods are keyword-only and passing positional arguments to APIs will raise a TypeError
.
# Keyword arguments only!
client.search(index="target-index")
# This will raise a TypeError:
client.search("target-index")
Requiring keyword arguments makes your code resilient to additional API parameters being added or changed and makes your code more readable: Double win!