Optimize elastic search performance

I am trying to benchmark my Elasticsearch setup by posting documents against a large schema. The two variations of schema are:

  1. indexing enabled for each attribute.
  2. indexing disabled for all the attributes.

My benchmark consists of only going to elastichq cluster and checking spikes in CPU.
However, I don't see the CPU spikes dropping when using the option 2.

Question: Disabling indexing should result in a better performance?

Setup:

Running Elasticsearch on a docker with 1 shard and 1 replica for the index.

Schema with index enabled (pastebin link) : { "aliases": {}, "mappings": { "properties": { " - Pastebin.com

Schema with index disabled (pastebin link) : { "aliases": {}, "mappings": { "properties": { " - Pastebin.com

Document:

{
   "status": "open",
   "created_at": "2022-02-14",
   "long_12": 123456789,
   "division": {
       "prop_1":  112211,
       "prop_2":  false,
       "currency": "a brief text"
   },
   "emails":{
       "email": "abc@gmail.com"
   }
 }

Load test scenario: created 10 Java threads running on i7 laptop and each thread posted 100000 documents with some modification (to keep the document distinct status field value was randomly generated).

More detail on why I am doing this:
So, my Production Elasticsearch (ES) cluster is performing very bad with Read going upwards of 10 second. And apart from all the necessary Read optimization I can do; I am also noticing that ES cluster is generally very busy. And I noticed that my ES index schema doesn't have indexing disabled for any attribute (and we have around 350 attributes).
So, my expectation was that if I set indexing disabled for unnecessary attributes, I can get some wins. However, that's not happening.

Can you please shed some light on:

  1. Does setting index: false and enabled: false should have improved performance.
  2. Am I disabling the index on attributes the right way.
  3. Is my benchmarking technique right

NOTE Document and schema are for reference purpose only the actual schema and document in PROD is quite large. And the result was consistent when benchmarked using a large document.

I would recommend first trying to identify what is causing bad performance in your production cluster before trying to run some benchmarks. Can you please answer the following questions to give us a better idea of your use case?

Which version of Elasticsearch are you using?

What is the size and specification of your cluster in terms of number of nodes, RAM, CPU and storage?

How much data do you have in the cluster? How many indices and shards is this distributed across?

What does the work load look like? Is it mainly querying? Do you do a lot of indexing? Do you perform updates and deletes?

What does the slow queries look like? How large portion of your data do they target?

Thanks Christian.
I know why my production cluster is slow. It has all sort of issues you can imagine:

  1. Default partitioning based on doc id ( for my use case it is bad) .
  2. Read query is like select * with some even having _scripts .
  3. Index schema it self is huge and so are the documents ( 300 attributes).
  4. Indexing on all the 300 attributes is enabled.

Regardless of my production cluster, I am evaluating that disabling indexing on all the attributes of schema should result in better performance i.e. less busy CPU.
But I don't see that happening.

Below is the screen shot of CPU utilization when indexing on all attributes is enabled and almost the same graph when indexing on all attributes is disabled ( i lost the second screen shot).

Coming to your question, the information I have for my PROD cluster is :
Version : 7.7.1
Number of nodes : 6
Workload on ES is 60% querying and 40 % write (including update).
We don't delete data from index that often.
Slow queries are trying to get all the attributes of index.
Number of index: 4000

I will ask all the remaining detail from the Infra team.

Please upgrade, that version is very much past EOL and is no longer supported.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.