Import Pretrained Model to Elasticsearch Cluster

Khanh_Dao_Minh · November 4, 2023, 3:00pm

Hello everyone. I have a question about import sentence-transformer model to elasticsearch cluster. When I run the python script below, I see only 1 node has allocated my mode, but I want to allocate my model in 2 nodes to improve the performance of the semantic embedding search. If someone has encountered this problem, please help me!!

I use elasticsearch version 8.7 and free trial license for testing my cluster.

I have 2 elasticsearch node with the below config:

Node 1:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: elk1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/elk1/elasticsearch-8.7.0/data
#
# Path to log files:
#
path.logs: /home/elk1/elasticsearch-8.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 192.168.0.107
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
transport.host: 192.168.0.107
transport.port: 8880
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["192.168.0.108", "192.168.0.107"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["elk1", "elk2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false
xpack.security.enabled: true 

xpack.security.http.ssl:
  enabled: false
  keystore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate 
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
  #xpack.security.transport.ssl:
  #  enabled: true
  #  verification_mode: certificate
  #keystore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12
  #truststore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12

xpack.ml.enabled: true

Node 2:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: elk2
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/elk2/elasticsearch-8.7.0/data
#
# Path to log files:
#
path.logs: /home/elk2/elasticsearch-8.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 192.168.0.108
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
transport.host: 192.168.0.108
transport.port: 8880
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["192.168.0.108", "192.168.0.107"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["elk1", "elk2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false
xpack.security.enabled: true 

xpack.security.http.ssl:
  enabled: false
  keystore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate 
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
  #xpack.security.transport.ssl:
  #  enabled: true
  #  verification_mode: certificate
  #keystore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12
  #truststore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12

xpack.ml.enabled: true

I use eland library for import my model to cluster.

from pathlib import Path
from eland.common import es_version
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
import os 

es = elasticsearch.Elasticsearch("http://elastic:KRvLadZXe3Ky01aYBwMx@192.168.0.107:9200", timeout=60)

tm = TransformerModel(model_id="sentence-transformers/quora-distilbert-multilingual", task_type="text-embedding")
es_version = es_version(es)
tmp_path = "models"
if os.path.exists(tmp_path) is False:
    os.makedirs(tmp_path, exist_ok=True)
model_path, config, vocab_path = tm.save(tmp_path)
ptm = PyTorchModel(es, tm.elasticsearch_model_id())
ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
ptm.start()

Khanh_Dao_Minh · November 5, 2023, 3:09am

Result in elasticsearch version 8.7, but only 1 node has been allocated model

system · December 3, 2023, 3:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Import trained model to ElastichSearch Elasticsearch	3	245	October 3, 2023
Importing Hugging Face Model into my Desktop Elasticsearch ingest-pipeline	1	68	December 20, 2024
Using custom pytorch model in ES8 Elasticsearch	1	299	July 25, 2022
Trained model deployment is not allocated to any nodes Elastic Search	0	53	October 31, 2025
Authorization exception when trying to import model to Elasticsearch cluster Elasticsearch elastic-stack-security , elastic-stack-machine-learning	3	3073	August 18, 2022

Import Pretrained Model to Elasticsearch Cluster

Related topics