I am trying to benchmark my Elasticsearch setup by posting documents against a large schema. The two variations of schema are:
- indexing enabled for each attribute.
- indexing disabled for all the attributes.
My benchmark consists of only going to elastichq cluster and checking spikes in CPU.
However, I don't see the CPU spikes dropping when using the option 2.
Question: Disabling indexing should result in a better performance?
Setup:
Running Elasticsearch on a docker with 1 shard and 1 replica for the index.
Schema with index enabled (pastebin link) : { "aliases": {}, "mappings": { "properties": { " - Pastebin.com
Schema with index disabled (pastebin link) : { "aliases": {}, "mappings": { "properties": { " - Pastebin.com
Document:
{
"status": "open",
"created_at": "2022-02-14",
"long_12": 123456789,
"division": {
"prop_1": 112211,
"prop_2": false,
"currency": "a brief text"
},
"emails":{
"email": "abc@gmail.com"
}
}
Load test scenario: created 10 Java threads running on i7 laptop and each thread posted 100000 documents with some modification (to keep the document distinct status field value was randomly generated).
More detail on why I am doing this:
So, my Production Elasticsearch (ES) cluster is performing very bad with Read going upwards of 10 second. And apart from all the necessary Read optimization I can do; I am also noticing that ES cluster is generally very busy. And I noticed that my ES index schema doesn't have indexing disabled for any attribute (and we have around 350 attributes).
So, my expectation was that if I set indexing disabled for unnecessary attributes, I can get some wins. However, that's not happening.
Can you please shed some light on:
- Does setting index: false and enabled: false should have improved performance.
- Am I disabling the index on attributes the right way.
- Is my benchmarking technique right
NOTE Document and schema are for reference purpose only the actual schema and document in PROD is quite large. And the result was consistent when benchmarked using a large document.