Hello everyone. I have a question about import sentence-transformer model to elasticsearch cluster. When I run the python script below, I see only 1 node has allocated my mode, but I want to allocate my model in 2 nodes to improve the performance of the semantic embedding search. If someone has encountered this problem, please help me!!
I use elasticsearch version 8.7 and free trial license for testing my cluster.
I have 2 elasticsearch node with the below config:
Node 1:
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: elk1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/elk1/elasticsearch-8.7.0/data
#
# Path to log files:
#
path.logs: /home/elk1/elasticsearch-8.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 192.168.0.107
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
transport.host: 192.168.0.107
transport.port: 8880
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["192.168.0.108", "192.168.0.107"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["elk1", "elk2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false
xpack.security.enabled: true
xpack.security.http.ssl:
enabled: false
keystore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
#xpack.security.transport.ssl:
# enabled: true
# verification_mode: certificate
#keystore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12
#truststore.path: /home/elk1/elasticsearch-8.7.0/config/elastic-certificates.p12
xpack.ml.enabled: true
Node 2:
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: elk2
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /home/elk2/elasticsearch-8.7.0/data
#
# Path to log files:
#
path.logs: /home/elk2/elasticsearch-8.7.0/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
network.host: 192.168.0.108
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
http.port: 9200
transport.host: 192.168.0.108
transport.port: 8880
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["192.168.0.108", "192.168.0.107"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["elk1", "elk2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Allow wildcard deletion of indices:
#
#action.destructive_requires_name: false
xpack.security.enabled: true
xpack.security.http.ssl:
enabled: false
keystore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
#xpack.security.transport.ssl:
# enabled: true
# verification_mode: certificate
#keystore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12
#truststore.path: /home/elk2/elasticsearch-8.7.0/config/elastic-certificates.p12
xpack.ml.enabled: true
I use eland library for import my model to cluster.
from pathlib import Path
from eland.common import es_version
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import TransformerModel
import os
es = elasticsearch.Elasticsearch("http://elastic:KRvLadZXe3Ky01aYBwMx@192.168.0.107:9200", timeout=60)
tm = TransformerModel(model_id="sentence-transformers/quora-distilbert-multilingual", task_type="text-embedding")
es_version = es_version(es)
tmp_path = "models"
if os.path.exists(tmp_path) is False:
os.makedirs(tmp_path, exist_ok=True)
model_path, config, vocab_path = tm.save(tmp_path)
ptm = PyTorchModel(es, tm.elasticsearch_model_id())
ptm.import_model(model_path=model_path, config_path=None, vocab_path=vocab_path, config=config)
ptm.start()