Most optimal ElasticSearch

ekrx · April 27, 2021, 10:14pm

This is a very ambiguous question since I don't know exactly where to start. We require the most optimized ES performance for a mission critical system. Currently our ES queries are taking somewhere around 100-200 ms to complete and the goal is to reduce this to 20 ms.

There are 2 aspects that I would like to get your inputs.
1 - What are the most important considerations and best practices for defining a ES schema for the most optimal request.
2- What are the most important consideration and best practices for querying ES to be most optimal.

And additionally how to troubleshoot where is the time spent and why are the queries so slow.

A little bit of context of the system, is a typeahead system that would take the input of the string and return a list of suggestion that match the input.

The schema that I have looks like this:

{
	"domain": String
	"documentId": String
	"localizedText": Map<String, String>
	"score": Map<String, double>

}

Here is a sample of the document:

{
	"domain": "fast-food",
	"documentId": "123",
	"localizedText": {
		"en_US": "Hamburger",
		"es_ES": "Hamburguesa"
	},
	"score": {
		"en_US": 0.5,
		"es_ES": 0.4
	}
}

I also expect to have thousands of different domain

The queries that I'm doing are first filtering on the domain, to trim down the search, as you can specify multiple domains, the second step is try to do text match on the localizedText for a given locale passed on the request, and lastly sort the candidates based on the score, also for the same locale.

Any insights and resources will be much appreciated.

Thank you

warkolm · April 28, 2021, 12:22am

That's a pretty small document.

What is the output from the _cluster/stats?pretty&human API?

ekrx · April 28, 2021, 12:41am

I did simplified this a little bit, the scores and localizedText field each contain around 40 entries. It is a small document, but we have over 250gb of data.

The localized text is treated with several tokenizers and analyzers to allow a more relevant search.

warkolm · April 28, 2021, 12:44am

What is the output from the _cluster/stats?pretty&human API?

ekrx · April 28, 2021, 12:48am

{
  "_nodes" : {
    "total" : 6,
    "successful" : 6,
    "failed" : 0
  },
  "cluster_name" : "prod-elasticsearch",
  "cluster_uuid" : "uuid",
  "timestamp" : 1619570675725,
  "status" : "green",
  "indices" : {
    "count" : 21,
    "shards" : {
      "total" : 51,
      "primaries" : 25,
      "replication" : 1.04,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 10,
          "avg" : 2.4285714285714284
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 1.1904761904761905
        },
        "replication" : {
          "min" : 1.0,
          "max" : 2.0,
          "avg" : 1.0476190476190477
        }
      }
    },
    "docs" : {
      "count" : 5569088,
      "deleted" : 1496605
    },
    "store" : {
      "size" : "309.4gb",
      "size_in_bytes" : 332314644510
    },
    "fielddata" : {
      "memory_size" : "24kb",
      "memory_size_in_bytes" : 24648,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "133.6mb",
      "memory_size_in_bytes" : 140108427,
      "total_count" : 2635966,
      "hit_count" : 750835,
      "miss_count" : 1885131,
      "cache_size" : 25629,
      "cache_count" : 57235,
      "evictions" : 31606
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 669,
      "memory" : "66.3mb",
      "memory_in_bytes" : 69587804,
      "terms_memory" : "41.1mb",
      "terms_memory_in_bytes" : 43194276,
      "stored_fields_memory" : "12.2mb",
      "stored_fields_memory_in_bytes" : 12881592,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "6.4mb",
      "norms_memory_in_bytes" : 6761728,
      "points_memory" : "884.6kb",
      "points_memory_in_bytes" : 905874,
      "doc_values_memory" : "5.5mb",
      "doc_values_memory_in_bytes" : 5844334,
      "index_writer_memory" : "2.2gb",
      "index_writer_memory_in_bytes" : 2379896102,
      "version_map_memory" : "5.9mb",
      "version_map_memory_in_bytes" : 6260384,
      "fixed_bit_set" : "426.6kb",
      "fixed_bit_set_memory_in_bytes" : 436840,
      "max_unsafe_auto_id_timestamp" : 1619568003863,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 6,
      "coordinating_only" : 0,
      "data" : 3,
      "ingest" : 6,
      "master" : 3,
      "ml" : 6,
      "voting_only" : 0
    },
    "versions" : [
      "7.5.2"
    ],
    "os" : {
      "available_processors" : 18,
      "allocated_processors" : 18,
      "names" : [
        {
          "name" : "Linux",
          "count" : 6
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 16.04.6 LTS",
          "count" : 6
        }
      ],
      "mem" : {
        "total" : "134.6gb",
        "total_in_bytes" : 144553402368,
        "free" : "2.7gb",
        "free_in_bytes" : 2962989056,
        "used" : "131.8gb",
        "used_in_bytes" : 141590413312,
        "free_percent" : 2,
        "used_percent" : 98
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 224
      },
      "open_file_descriptors" : {
        "min" : 353,
        "max" : 594,
        "avg" : 462
      }
    },
    "jvm" : {
      "max_uptime" : "6d",
      "max_uptime_in_millis" : 525273695,
      "versions" : [
        {
          "version" : "13.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "13.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 6
        }
      ],
      "mem" : {
        "heap_used" : "25.5gb",
        "heap_used_in_bytes" : 27419750528,
        "heap_max" : "67.3gb",
        "heap_max_in_bytes" : 72282537984
      },
      "threads" : 332
    },
    "fs" : {
      "total" : "3.6tb",
      "total_in_bytes" : 3994254385152,
      "free" : "3.3tb",
      "free_in_bytes" : 3649938743296,
      "available" : "3.1tb",
      "available_in_bytes" : 3446900875264
    },
    "plugins" : [
      {
        "name" : "repository-s3",
        "version" : "7.5.2",
        "elasticsearch_version" : "7.5.2",
        "java_version" : "1.8",
        "description" : "The S3 repository plugin adds S3 repositories",
        "classname" : "org.elasticsearch.repositories.s3.S3RepositoryPlugin",
        "extended_plugins" : [ ],
        "has_native_controller" : false
      }
    ],
    "network_types" : {
      "transport_types" : {
        "security4" : 6
      },
      "http_types" : {
        "security4" : 6
      }
    },
    "discovery_types" : {
      "zen" : 6
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 6
      }
    ]
  }
}

warkolm · April 28, 2021, 12:57am

Upgrading would be advisable. 6.5 reached EOL on 2020-05-14.

ekrx:

"versions" : [
        {
          "version" : "13.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "13.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 6
        }

As would upgrading your JVM.

Christian_Dahlqvist · April 28, 2021, 5:16am

I would recommend that you read this part of the docs, which provides a lot of good recommendations. For optimal search speed it is important that the query is as efficient as possible, and you can use the profile API to tune this. It is however also vital that e.g. disk I/O is not slowing them down. You can reduce or eliminate the impact of disk I/O by using fast SSDs or simply have enough RAM for the full data set to be cached in the operating system page cache.

It would help if you could describe the cluster setup, e.g. what is the specification of the data nodes? How much heap and RAM do the data nodes have? What type of storage are you using?

system · May 26, 2021, 5:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Further optimization to ES queries / performance Elasticsearch	1	338	September 3, 2020
ElasticSearch Optimizing questions Elasticsearch	5	403	December 16, 2020
Optimization question Elasticsearch	5	635	January 5, 2018
Is my response time is ok? Elasticsearch	18	7939	July 6, 2018
Performance of using Elasticsearch to search for people Elasticsearch	16	1302	September 12, 2022

Most optimal ElasticSearch

Related Topics