Why do I need an ML model?
An ML model in Elasticsearch could be used for enriching your data during indexing. Some examples of what an ML model can do:
- extract entities from text (NER)
- predicting classes (classification)
- identify language (classification)
- generate text embeddings (for subsequent vector search).
For more information, see Machine learning in the Elastic stack.
How do I get an ML model?
To manage trained models in Elasticsearch, you need at least a Platinum license, but if you just want to experiment with the feature, you can start a trial.
After starting a trial, you should be able to see "Analytics" -> "Machine learning" in Kibana menu, and after clicking that, you would see "Trained models" tab under "Model management" section.
Hugging Face website is a great resource for trained models.
What is Eland?
Eland is an open-source tool for data analysis, specifically for data stored in Elasticsearch. Currently, it's also the recommended tool for importing an ML model in Elasticsearch. This is the use case we're talking about here.
How do I run it?
You'll need to either install Eland, or run it in Docker.
With Docker
First, you have to install Docker.
Then, you need to clone the Eland repository. Assuming you have git:
git clone git@github.com:elastic/eland.git
Then, you have to build the elastic/eland
image with Docker:
cd /path-to-repository/eland
docker build -t elastic/eland .
After that, you would run a command in your terminal / console application similar to this:
docker run -it --rm --network host \
elastic/eland \
eland_import_hub_model \
--url http://host.docker.internal:9200/ \
--hub-model-id philschmid/distilroberta-base-ner-conll2003 \
--task-type ner \
--es-username elastic \
--es-password changeme \
--start
The command above:
- runs a Docker container with eland installed in it
- in the container, calls
eland_import_hub_model
to import a model from Hugging Face website calledphilschmid/distilroberta-base-ner-conll2003
- assumes that Elasticsearch is running and accessible on localhost, port 9200
- provides username and password of
elastic
user.
host.docker.internal
is a special DNS name that resolves to the internal IP address used by the host. Using this name, services inside the container can access services running on the host machine.
Without Docker
Eland is a Python module, which means it has to be installed using one of the Python package installers - either pip
or conda
. In most modern operating systems, Python is pre-installed, and if you have Python, you have pip
.
Prerequisites
There are a few prerequisites (OS packages) that should be installed before you can install and run Eland. The following command is for Debian-based OS:
sudo apt-get install -y \
build-essential pkg-config cmake \
python3-dev libzip-dev libjpeg-dev
Other Linux distributions such as CentOS, RedHat, Arch, etc. may require using a different package manager and specifying different package names.
If you're using Windows, it's likely that you don't need to install any of those.
If you're using macOS, you might need to install Xcode command-line tools:
xcode-select --install
Compatibility
Eland has a few compatibility requirements:
- Python 3.7+ and Pandas 1.3.
- Elasticsearch 7.11+, recommended 8.3 or later.
- PyTorch
1.11.0
or earlier.
First of all, run your preferred terminal application, and check your python version:
python --version
- If
python
command is not available, see if you havepython3
- If you have
python
andpython3
, pick the one with higher version - If you have either
python
orpython3
, but the version is below 3.7, download and install newer Python. - If you don't have
python
orpython3
, download and install Python.
On macOS, Python 3 can be installed with homebrew:
brew install python
As of November 2022, you can download and install Python from 3.7.x to 3.11.x. Usually, you want to install the latest version, unless there's a specific reason to use older Python, and we have a specific reason. We need to install PyTorch 1.11.x, which requires a Python version less than 3.10.
Once Python is installed, check its --version
as specified above.
It's usually a good idea to upgrade Python packages related to the package installer. Even if you just installed Python, they might not be at their latest version:
python -m pip install --upgrade pip setuptools wheel
Installing Eland and PyTorch
After that, you're good to go and install the eland
package:
python -m pip install eland
Eland lists pandas
as one of its default requirements, so at this point, it would also be installed, with no extra actions needed. By default, Eland doesn't install PyTorch, so we need to install it explicitly:
python -m pip install 'eland[pytorch]'
This might take a while. Eventually, if the command completes successfully, you should be able to run the Eland command to import a model, called eland_import_hub_model
.
If you're getting an error during PyTorch installation that says "No matching distribution found", this means that PyTorch doesn't provide binaries (wheels) for your combination of Python version, OS and architecture yet. In this case, your best bet is to open an issue in PyTorch. See a similar issue here.
Importing the model
The command eland_import_hub_model
has the following options:
> eland_import_hub_model --help
usage: eland_import_hub_model [-h] (--url URL | --cloud-id CLOUD_ID) --hub-model-id HUB_MODEL_ID [--es-model-id ES_MODEL_ID] [-u ES_USERNAME] [-p ES_PASSWORD] [--es-api-key ES_API_KEY]
[--task-type {fill_mask,question_answering,zero_shot_classification,text_embedding,text_classification,ner}] [--quantize] [--start] [--clear-previous] [--insecure]
[--ca-certs CA_CERTS]
optional arguments:
-h, --help show this help message and exit
--url URL An Elasticsearch connection URL, e.g. http://localhost:9200
--cloud-id CLOUD_ID Cloud ID as found in the 'Manage Deployment' page of an Elastic Cloud deployment
--hub-model-id HUB_MODEL_ID
The model ID in the Hugging Face model hub, e.g. dbmdz/bert-large-cased-finetuned-conll03-english
--es-model-id ES_MODEL_ID
The model ID to use in Elasticsearch, e.g. bert-large-cased-finetuned-conll03-english.When left unspecified, this will be auto-created from the `hub-id`
-u ES_USERNAME, --es-username ES_USERNAME
Username for Elasticsearch
-p ES_PASSWORD, --es-password ES_PASSWORD
Password for the Elasticsearch user specified with -u/--username
--es-api-key ES_API_KEY
API key for Elasticsearch
--task-type {fill_mask,question_answering,zero_shot_classification,text_embedding,text_classification,ner}
The task type for the model usage. Will attempt to auto-detect task type for the model if not provided. Default: auto
--quantize Quantize the model before uploading. Default: False
--start Start the model deployment after uploading. Default: False
--clear-previous Should the model previously stored with `es-model-id` be deleted
--insecure Do not verify SSL certificates
--ca-certs CA_CERTS Path to CA bundle
Example command:
eland_import_hub_model \
--url http://localhost:9200/ \
--hub-model-id philschmid/distilroberta-base-ner-conll2003 \
--task-type ner \
--es-username elastic \
--es-password changeme \
--start
Notice that if you're importing the model into an Elastic Cloud instance, you have an option to provide a --cloud-id
, vs a URL of the cluster.
The model is imported and started. Now what?
When you log in to Kibana and proceed to "Machine learning" -> "Trained models", you'll see a message informing you that you have new objects to synchronize. After clicking "Synchronize", the model will be accessible, and you can test it by clicking at the "..." button in "Actions" column and choosing "Test model".
The next step would be configuring an ingest pipeline with an inference processor, and using the pipeline when indexing your data.
Troubleshooting
If something didn't work with Eland, I would suggest the following sequence of actions:
- Google for the error.
- Search the existing issues in Eland repository for the error.
- Open a new issue in Eland repo.
The first two steps would often be enough to resolve the problem, but if not, don't hesitate to open an issue and ask for help. You'd be helping others who may be struggling with the same problem.