Improve Search Performance

Hi Team ,

I ingested huge amount of data (3-4 B records)with bellow index configuration

Index - Data Stream
Primaries - 5
replicas - 0
Master and Master eligible node - 3
Voting only node - 1
coordinator node - 1
Data nodes + ingest nodes - 140
My Data stream current size is 17 TB and havinf 125+ backend indices with 100gb each (will grow by each day)

The issue is that , when the ingestion is in progress all the nodes are consuming more than 90% of ram

ip              heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
xxx            45          99   4    0.34    0.34     0.36 dhi       -      aaa
xxx            51          65   0    0.00    0.00     0.00 mv        -      aaa
xxx           23          79   0    0.00    0.28     1.85 dhi       -      aaa
xxx           28          99   5    1.98    2.06     2.21 dhi       -      aaa
xxx             48          64   4    0.34    0.32     0.29 m         -      aaa
xxx            15          99   5    2.33    2.24     2.35 dhi       -      aaa
xxx            3          99   6    0.49    0.64     0.79 dhi       -      aaa
xxx             30          99   0    0.06    0.04     0.08 dhi       -      aaa
xxx            43          99   0    0.00    0.03     0.16 dhi       -      aaa
xxx            16          99   6    0.83    0.75     0.71 dhi       -      aaa
xxx           72          99  82    9.07    8.21     6.46 dhi       -      aaa
xxx           36          99   3    0.37    0.30     0.37 dhi       -      aaa
xxx           45          99   6    0.61    0.55     0.60 dhi       -      aaa
xxx             9          99   3    0.20    0.30     0.34 dhi       -      aaa
xxx             4          99   4    0.79    0.52     0.79 dhi       -      aaa
xxx            19          99   0    0.03    0.03     0.25 dhi       -      aaa
xxx             2          98  16    1.25    1.48     1.51 dhi       -      aaa

Probably because of this all of my search queries on a perticuler index is taking 40 sec - 1 m + to revert the results

bellow is given example

GET Query

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "cn": {
              "query": "Biryani",
              "fuzziness": "AUTO"
            }
          }
        },
        {
          "match": {
            "cn": {
              "query": "Biryani",
              "operator": "and",
              "boost": 2
            }
          }
        }
      ]
    }
  },
  "highlight": {
    "number_of_fragments" : 2, 
    "fields": {
      "cn": {}
    }
  },
  "from": 0,
  "size": 10,
  "_source": ["wu.original", "wd", "tt", "fi", "url" , "wu.domain" ]
}

Result

{
    "took": 77959,
    "hits": {
        "hits": [
            {
                "_score": 154.0433,
                "_source": {
                    "wd": "2024-02-21T01:24:38.000Z",
                    "wu": {
                        "original": "http://op.salesautopilot.com/changing-partners-jkwju/veg-biryani-price-per-kg-b6617e",
                        "domain": "op.salesautopilot.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "Manufacturer of <em>Biryani -</em> Frozen Egg <em>Biryani,</em> Frozen Mutton <em>Biryani,</em> Frozen Chicken <em>Biryani</em> and Frozen",
                        "Hyderabadi Veg <em>Biryani.</em> WHAT IS THE COST OF THE <em>BIRYANI?</em> <em>A Biryani</em> that was worth a war!"
                    ]
                }
            },
            {
                "_score": 153.18138,
                "_source": {
                    "wd": "2024-02-23T00:48:29.000Z",
                    "wu": {
                        "original": "http://ygb.net.br/is-attendance-usj/9abc0d-biryani-lover-biryani-quotes",
                        "domain": "ygb.net.br"
                    }
                },
                "highlight": {
                    "cn": [
                        "Malabar Meen <em>Biryani (</em> Malabar style Fish <em>Biryani)</em> <em>\\Biryani</em> with fish?",
                        "Malai chicken <em>biryani</em> over mutton <em>biryani we</em> have included chicken <em>biryani -</em> a delicious dish."
                    ]
                }
            },
            {
                "_score": 149.81018,
                "_source": {
                    "wd": "2024-02-23T23:59:57.000Z",
                    "wu": {
                        "original": "https://jaromirstetina.cz/fusion-pro-wtxdu/vellore-biryani-recipe-7b10e2",
                        "domain": "jaromirstetina.cz"
                    }
                },
                "highlight": {
                    "cn": [
                        "Recipe Source 3. <em># Biryani #</em> BiryaniInIndia # GourmetOnTheRoad. vellore chicken hotel <em>biryani.</em>",
                        "Ambur mutton <em>biryani is</em> a delicious <em>biryani</em> recipe. Adapted from Star <em>Briyani</em> Hotel."
                    ]
                }
            },
            {
                "_score": 149.61472,
                "_source": {
                    "wd": "2024-02-21T14:27:15.000Z",
                    "wu": {
                        "original": "https://themadscientistskitchen.com/vegetarian-methi-matar-biryani-recipe/",
                        "domain": "themadscientistskitchen.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "What is Dum <em>Biryani?</em> Dum <em>Biryani is</em> the method of cooking a <em>biryani.</em>",
                        "However, <em>Biryani is</em> classified as Kacchi or raw <em>Biryani</em> and Pakki or cooked <em>Biryani!</em>"
                    ]
                }
            },
            {
                "_score": 147.06299,
                "_source": {
                    "wd": "2024-02-25T17:33:33.000Z",
                    "wu": {
                        "original": "https://themadscientistskitchen.com/vegetarian-methi-matar-biryani-recipe/",
                        "domain": "themadscientistskitchen.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "What is Dum <em>Biryani?</em> Dum <em>Biryani is</em> the method of cooking a <em>biryani.</em>",
                        "However, <em>Biryani is</em> classified as Kacchi or raw <em>Biryani</em> and Pakki or cooked <em>Biryani!</em>"
                    ]
                }
            },
            {
                "_score": 144.04364,
                "_source": {
                    "wd": "2024-02-25T12:00:54.000Z",
                    "wu": {
                        "original": "https://www.desicookingrecipes.com/featured/biryani-recipes/page/2/",
                        "domain": "www.desicookingrecipes.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "<em>0 Biryani</em> Recipes Chickpeas <em>Biryani |</em> Ventuno Home Cooking Home C. Likes!",
                        "<em>0 Biryani</em> Recipes Dindigul Thalappakatti <em>Biriyani</em> / Seeraga Samba Mutton <em>Biryani/</em> Thalapakattu <em>Biryani</em>"
                    ]
                }
            },
            {
                "_score": 142.32794,
                "_source": {
                    "wd": "2024-02-21T09:36:34.000Z",
                    "wu": {
                        "original": "https://themadscientistskitchen.com/6-much-loved-biryanis/",
                        "domain": "themadscientistskitchen.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "2 shares Facebook Twitter Pinterest2 Yummly Mix <em>A Biryani is</em> also known as <em>biriyani</em> or <em>biriani.</em>",
                        "<em>Biryanis</em> Here is what I gathered… <em>Biryani</em> alternate names are <em>Biriyani,</em> <em>biriani,</em> buriyani, breyani,briani"
                    ]
                }
            },
            {
                "_score": 140.27979,
                "_source": {
                    "wd": "2024-02-25T18:21:45.000Z",
                    "wu": {
                        "original": "https://www.crazymasalafood.com/top-20-biryani-houses-in-mumbai-you-shouldnt-miss/",
                        "domain": "www.crazymasalafood.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "A delivery kitchen located in Go <em>Biryan is</em> your perfect option for an ultimate <em>Biryani</em> feast.",
                        "All you have to do is try Behrouz <em>Biryani’s</em> range of <em>biryanis,</em> including Afghani <em>Biryani,</em> vegetable <em>biryani</em>"
                    ]
                }
            },
            {
                "_score": 140.16077,
                "_source": {
                    "wd": "2024-02-22T03:19:16.000Z",
                    "wu": {
                        "original": "https://www.captionsbyte.com/biryani-captions-for-instagram/",
                        "domain": "www.captionsbyte.com"
                    }
                },
                "highlight": {
                    "cn": [
                        "Food goals: <em>Biryani.</em> <em>Biryani:</em> my comfort food. Keep calm and <em>Biryani</em> on. <em>Biryani is</em> my happy place.",
                        "<em>#biryani</em> My one true love: <em>biryani!</em> <em>Biryani –</em> My heart and my tummy’s delight."
                    ]
                }
            },
            {
                "_score": 137.51003,
                "_source": {
                    "wd": "2024-02-24T02:23:43.000Z",
                    "wu": {
                        "original": "https://www.heritagetimes.in/the-allure-of-a-biryani-with-aloo-kolkata-biryani/",
                        "domain": "www.heritagetimes.in"
                    }
                },
                "highlight": {
                    "cn": [
                        "THE ALLURE OF <em>A BIRYANI</em> WITH ALOO: KOLKATA <em>BIRYANI</em> Skip to content Heritage Times Exploring lesser known",
                        "Although, many people shrug off the Kolkata <em>biryani as</em> merely a variant of the Awadhi/Lucknawi <em>biryani</em>"
                    ]
                }
            }
        ]
    }
}

when ingestion is stopped the results we are getting in 3-5 sec

my ask here is

  1. How to optimize the search result response time when the ingestion is in progress

  2. going forward ingestion is a continues process , so if the ingestion is making the query response delay how can i configure the elasticsearch to not impact on each other( ingestion with querying )

  3. A general doubt - as we can have only one master node for cluster all ingest requests and CRUD operation requests will be oversaw by master only , and GET queries as well go via MASTER node ? or data node as well servers the GET queries ? if data nodes as well servers the GET requests from API requests , then building an loadbalencer before all my data nodes will help me to reduce response time greatly ? suggest me best way to handle this scenario

Hello,

Your infrastructure is a little confusing, you didn't provide any information about the specs of your nodes.

Also, do you have 140 data + ingest nodes? You mentioned that your data stream have 17 TB, but how many of this do you have? Or you have just 17 TB in your cluster? This seems an abnormal number of nodes for just 17 TB.

Can you provide more context? Like what are the specs of your nodes, how is the total size of your cluster, what is the return of GET _cluster/health etc

To which nodes are you sending the requests for both indexing and search? You should not sending any request to the master nodes, but directly to your data nodes.

What is the full output of the cluster stats API?

What type of hardware is the cluster deployed on? What type of storage are you using?

You can not separate this.

Master nodes manage the cluster and are not involved in or overseeing normal request processing.

Hi Leandro ,

Thanks for your quick response , this following is _cluster/health

{
  "cluster_name": "ELK-CLUSTER",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 148,
  "number_of_data_nodes": 144,
  "active_primary_shards": 1154,
  "active_shards": 1199,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 1,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 99.91666666666667
}

currently one index is unassigned so in red , please ignore that

and bellow is the configuration of elasticsearch.yml in data nodes


# ======================== Elasticsearch Configuration =========================
cluster.name: ELK-CLUSTER
# Use a descriptive name for the node:
node.name: aaa #aaa
node.roles: [ data , data_hot, ingest]
#node.roles: [ ]
# ----------------------------------- Paths ------------------------------------
# Path to directory where to store the data (separate multiple locations by comma):
path.data: /data/elasticsearch
# Path to log files:
path.logs: /var/log/elasticsearch
network.host: IP
http.port: 9200
http.max_content_length: 2147483647b  
#
# --------------------------------- Discovery ----------------------------------
discovery.seed_providers: file

# Enable security features
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: false
    #keystore.path: certs/http.p12
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: /etc/elasticsearch/certs/elastic-certificates.p12
  truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
#cluster.initial_master_nodes: ["servers2"]
http.host: 0.0.0.0
transport.host: 0.0.0.0
#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

and for master node , the node roles are only master

You need to provide the other things that were asked.

What are the specs of your nodes? How are they deployed? What is the storage type being used? What is the total data in your cluster?

Hi ,

i deployed the Nodes in Ubuntu VMs and installed elastic using deb package

SPECS

root@aaa-new:/etc/elasticsearch# lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 9554 64-Core Processor
    CPU family:          25
    Model:               17
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           2
    Stepping:            1
    BogoMIPS:            6190.69

root@aaa-new:/etc/elasticsearch# free -h
               total        used        free      shared  buff/cache   available
Mem:            62Gi        32Gi       368Mi       1.0Mi        29Gi        29Gi
Swap:             0B          0B          0B


root@aaa-new:/etc/elasticsearch# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       3.4T  606G  2.8T  18% /
root@servers30-new:/etc/elasticsearch#

Storage type is SSD mounted volumes of 3.5 TB in each machine

Total data in cluster is
for now i created few indices which used for testing , apart from that major data is webcrawl data in one particular data stream. bellow i provided all indices _cat/indices

health status index                                                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   .ds-logs-webcrawl-prod-2024.05.08-000012                           YQxLp4TxTJK5002680ErsA   5   0   10952000            0    137.8gb        137.8gb      137.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000013                           GcTMqnJNQ-S_tpGJq9xBhQ   5   0   11798400            0    154.8gb        154.8gb      154.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000014                           9zQiD9irQeGMec_cFho5-Q   5   0   10009600            0    130.8gb        130.8gb      130.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000015                           8BSxcpg-SLifRc_yzNFi0g   5   0   10068800            0    135.6gb        135.6gb      135.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000016                           NdXOHJEPRYaZHpRPPTioVQ   5   0    9800583            0    129.5gb        129.5gb      129.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000017                           QCKUwCeGRLa34OuO-gc9ZA   5   0    9547200            0    128.6gb        128.6gb      128.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000018                           6VLrnyYYQGa2gQCyQ71P3Q   5   0    9347200            0      124gb          124gb        124gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000019                           8UyURUiHRmaQcMm0e7n2Gg   5   0    9377600            0    132.9gb        132.9gb      132.9gb
green  open   my_index                                                           FH993Jq3SQibwluX-hcmQg   1   1          3            0     35.7kb         17.8kb       17.8kb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000020                           HHi3Faq6SjCh49H1VOQY3g   5   0    9176000            0    134.6gb        134.6gb      134.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000021                           vz-TUR3FTZSSRk-ccE3RYg   5   0    9662400            0    134.3gb        134.3gb      134.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000022                           bKKkgvWZRReRAPeBzyuSWg   5   0    9480000            0    134.2gb        134.2gb      134.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000023                           Ljbge9KbR92PrGs9f9N4Iw   5   0    9491200            0    126.4gb        126.4gb      126.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000024                           kJxawPjnR8G1mKKI5HyifA   5   0    9555200            0    131.1gb        131.1gb      131.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000025                           SG1p0q-tThGEZWi9xFebhQ   5   0    9653365            0    129.9gb        129.9gb      129.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000026                           YuaQ92WdRRa1OcJVeed7SQ   5   0   13057600            0    164.5gb        164.5gb      164.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000027                           josLhXxAReCxBAVKPBPIcg   5   0   12230846            0      160gb          160gb        160gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000028                           K5Fu224ASGGUAdWGRKt9kw   5   0   10803200            0      142gb          142gb        142gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000029                           a5KWyMjrRmyWl9cbluGTOw   5   0   10838400            0    140.8gb        140.8gb      140.8gb
green  open   .internal.alerts-observability.metrics.alerts-default-000001       9fqdXCsfQ-e1VIeS3L_7Zg   1   1          0            0       500b           250b         250b
green  open   .internal.alerts-security.alerts-default-000001                    5JNMkNz_SIi4ML5a0BnpfQ   1   1          0            0       500b           250b         250b
green  open   .ds-logs-webcrawl-prod-2024.05.09-000039                           tl8lPcdHSqWtIE5av0410A   5   0   10964800            0    144.6gb        144.6gb      144.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000030                           EDuijk5pQXS3Yt3yXS48HA   5   0   10544000            0    142.9gb        142.9gb      142.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000031                           OI5zkAopQVCd9K8IfyRycA   5   0   10808000            0    147.1gb        147.1gb      147.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000032                           NsBdHgLVQrGJbqlcnnnLkg   5   0    9907200            0    135.5gb        135.5gb      135.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000033                           WPNpbzUwTEaReYBJ6-ntSA   5   0   10339200            0    144.3gb        144.3gb      144.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000034                           V32oRIYJTuu_MAIu4vXxeg   5   0   10843200            0    146.5gb        146.5gb      146.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000035                           6ZThIdXPTgO4dj-DdojpYg   5   0   10835200            0      137gb          137gb        137gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000036                           WX-4E8BlTsWp1xqiU3BIaA   5   0   11382400            0    149.9gb        149.9gb      149.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000037                           y_5G1j2xR--5TxzyUr9tdQ   5   0   10067200            0    133.9gb        133.9gb      133.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000038                           G2jk1HviRY-DIYQXCz26tA   5   0   11107200            0    149.5gb        149.5gb      149.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000040                           PDHfetGGQN2L_nE_BA7MLw   5   0   10995200            0    141.9gb        141.9gb      141.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000041                           sbHQzEL3SUGiBrI0AZTFag   5   0   11089600            0    146.7gb        146.7gb      146.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000042                           RwK_5_dWQxiAJ8NrBfQNCw   5   0   10868800            0    139.4gb        139.4gb      139.4gb
green  open   .ds-webcrawl-filestream-prod-2024.05.06-000034                     m3NjSxVkSNenV2G1OeqBCQ  20   0   63813405            0    519.7gb        519.7gb      519.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000044                           E5Vn5crmTzSlZQkgyNhytA   5   0   10868800            0    141.9gb        141.9gb      141.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000043                           JHyusHAZSBeebm6yvVS2YQ   5   0   11337600            0      141gb          141gb        141gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000046                           Z9I02z2QTbu9wMUHR7pekA   5   0   11832000            0    150.3gb        150.3gb      150.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000045                           EIyMklHNQ3-bN5Ls1hNbIw   5   0   11094400            0    149.7gb        149.7gb      149.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000048                           CxCi2r5KRl6ZK9ZJJYBYug   5   0   10587936            0    145.2gb        145.2gb      145.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000049                           n1X3uiqqRcWJOQLMyqptsg   5   0    9474625            0    132.5gb        132.5gb      132.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000047                           zXBDFwPGTZCQpIx9TkuqAA   5   0   11676800            0    155.2gb        155.2gb      155.2gb
green  open   .internal.alerts-default.alerts-default-000001                     iXZgzIzAR6yDJ8DqerOg1Q   1   1          0            0       500b           250b         250b
green  open   my-data-stream                                                     yApLa7viQqKOOyVHrw1atA   1   1          4            0     12.1kb            6kb          6kb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000050                           1uT5I-RGSWiOKjWxAozhaA   5   0    9385600            0    126.5gb        126.5gb      126.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000051                           fitVHb5ySE2BihdPc4IJlQ   5   0    9678400            0    131.5gb        131.5gb      131.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000052                           UPQNa3ieRZ-MzD6um7ONjg   5   0   10184932            0    125.3gb        125.3gb      125.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000053                           0C4h0SyfQfKWHnqOrAnwwQ   5   0   10563200            0    130.9gb        130.9gb      130.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000054                           Ky_LffszSUS-Z2oRrop2XQ   5   0    9939200            0    131.3gb        131.3gb      131.3gb
green  open   .internal.alerts-observability.apm.alerts-default-000001           GWaZu1sNQ9Km3_U0hJjiUA   1   1          0            0       500b           250b         250b
green  open   .ds-logs-webcrawl-prod-2024.05.09-000055                           ExLWSQomTkaSVHxO0-IcNA   5   0    9422400            0    131.6gb        131.6gb      131.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000057                           _aOgOYR4TxeGRklOSYekog   5   0    9632000            0    124.2gb        124.2gb      124.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000056                           CG3HMheGTxKWXXYY9G4OUw   5   0    9217600            0    125.7gb        125.7gb      125.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000059                           YoYsMC-sRO6Kv1TbFwKogg   5   0   12289600            0    164.6gb        164.6gb      164.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000058                           _fN4ByBqRhuHrq-unRkkng   5   0    9504000            0    113.6gb        113.6gb      113.6gb
green  open   your_index                                                         ZCfZ7PjpS8-8bN_qOqKlWw   1   1          1            0      7.7kb          3.8kb        3.8kb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000060                           -6vKYNBZSNibznqTLEK-YQ   5   0    9480000            0      128gb          128gb        128gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000061                           Px0d8K_fQ3u_BXbSK_hS5g   5   0    9158400            0    124.9gb        124.9gb      124.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000062                           c3fwwkKTRkKVHIbmL1ALgA   5   0   11280186            0    151.8gb        151.8gb      151.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000063                           KUTjnR9QTIO57_XzTtvzSg   5   0   11124800            0    154.7gb        154.7gb      154.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000064                           Sx_rmIsmRR6CBMWK74g8RQ   5   0    9329600            0    118.8gb        118.8gb      118.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000065                           zxIDSIvzTWSmx0h5qyT0RQ   5   0   13020800            0    157.2gb        157.2gb      157.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000066                           yCSf5O7NSTiRKL9uzCGyhA   5   0   13516800            0    162.9gb        162.9gb      162.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000067                           7knt0YjPQvOPlV1-PdICWg   5   0   12766400            0    159.9gb        159.9gb      159.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000068                           LSpNG-ydSyedEX1GT1UP7w   5   0    8436800            0    112.6gb        112.6gb      112.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000069                           3nC-uE2rThC9inDZ0AbNxw   5   0    8804800            0    120.5gb        120.5gb      120.5gb
green  open   .internal.alerts-observability.uptime.alerts-default-000001        60rQh10kTtaRnbYPRY7XZw   1   1          0            0       500b           250b         250b
green  open   example_index                                                      vC5yoU4tTAqXUVls2u56DQ   1   1          0            0       499b           249b         249b
green  open   .ds-logs-webcrawl-prod-2024.05.09-000070                           90PyNGHxRjSPYVjzNtm2zw   5   0   12411200            0    164.5gb        164.5gb      164.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000071                           B2CeD7FjSYO3ESInOBrK9A   5   0   12481600            0    173.7gb        173.7gb      173.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000072                           LgrROjIFTdmYXB8KBo1JwQ   5   0   12022948            0    156.8gb        156.8gb      156.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000073                           45dDmV35TT62cj-gKArKtQ   5   0   12115200            0    161.6gb        161.6gb      161.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000074                           PeiHuzsMS5Cev7qAXg5X_Q   5   0   12542400            0    155.6gb        155.6gb      155.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000075                           PVoyvPyPT3CHDLMndmtDdw   5   0   12188800            0    145.7gb        145.7gb      145.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000076                           knLR-1lmTYu-CAt17jNvaA   5   0   12076800            0    147.4gb        147.4gb      147.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000077                           wns4ZM9tRFyPcFoC6DdBJw   5   0   11931200            0    149.6gb        149.6gb      149.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.09-000078                           wUMe2L5CS2CYSMsiQSyvPA   5   0   11424000            0      145gb          145gb        145gb
green  open   idx_did_you_mean                                                   h3sqEkRbQfujcxA7Q0ao6g   1   1          0            0       500b           250b         250b
green  open   .ds-webcrawl-filestream-prod-2024.05.11-000036                     G000R-hjRRSoC7IMYQno4w  20   0   63982400            0    504.7gb        504.7gb      504.7gb
green  open   .ds-webcrawl-filestream-prod-2024.05.11-000037                     1yS1nYTiS-GCjZuXsddZOg  20   0   63772800            0    581.8gb        581.8gb      581.8gb
green  open   .ds-webcrawl-filestream-prod-2024.05.11-000038                     vbaZDvQfRiaaU2_-H_aoGw  20   0   11372464            0    114.3gb        114.3gb      114.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000079                           CwMfk9gKRAqj1yAQkSkBAw   5   0   12328000            0    157.9gb        157.9gb      157.9gb
green  open   .internal.alerts-ml.anomaly-detection.alerts-default-000001        RYwMI8yGRR629LemoslyXA   1   1          0            0       500b           250b         250b
green  open   .ds-webcrawl-filestream-prod-2024.04.27-000002                     LxN0ilaDS-yG4uAxIGNhBQ  20   0    6728570            0    103.4gb        103.4gb      103.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000080                           Q8i1YAOqTg6At-4DsbMSwg   5   0   12179200            0    155.7gb        155.7gb      155.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000081                           WcpKTyN9S3uGTpZYLpl8AA   5   0   12076800            0    154.7gb        154.7gb      154.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000082                           OyVlIdkMR3a5-25wc0tvFA   5   0   11916800            0    160.9gb        160.9gb      160.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000083                           NracqP78QUSSXIlAS4-kmg   5   0   12337600            0    159.4gb        159.4gb      159.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000084                           pqsIpckCQJGAEi0vZrYxDg   5   0   12012800            0    153.9gb        153.9gb      153.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000085                           F37WjUGhQwy9URm4gzzQJw   5   0   12500800            0    159.1gb        159.1gb      159.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000086                           S6QIaAGvQ6eClfxmt7kerw   5   0   11521600            0    153.1gb        153.1gb      153.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000087                           _eADjX-UTMi-gCHOQH8slg   5   0   11321600            0      152gb          152gb        152gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000088                           yA59262dSXSvDgXKQqOA8w   5   0   10695085            0    146.6gb        146.6gb      146.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000089                           NcPnGUC2SA6tMNH-GrC6dA   5   0   10686259            0    147.9gb        147.9gb      147.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000100                           wj-LMIzPTiu2J2aNWz6WNQ   5   0   11736000            0    144.7gb        144.7gb      144.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000101                           VYL-GNzDSNW4EY-i4FwS5A   5   0    9180800            0    121.1gb        121.1gb      121.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000102                           Scj-_GumS7yro5lqp62kog   5   0   11777600            0    153.8gb        153.8gb      153.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000103                           jM2zxAJiQ5aE0KE-7gC_kQ   5   0   12249600            0    156.9gb        156.9gb      156.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000104                           cjgTuz7CQlWTz7epDARDig   5   0   12148800            0    154.2gb        154.2gb      154.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000105                           l33WDKgTScO4Nn1bX-l_lQ   5   0    9073600            0    123.3gb        123.3gb      123.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000106                           8fwnHSiiSjam6fBaV6sQLQ   5   0    9333469            0    122.3gb        122.3gb      122.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000107                           ZEiF0MydRMWsLDSWi4vfmQ   5   0   13715200            0    155.6gb        155.6gb      155.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000108                           BKJVit_DQWa89xXErRNttQ   5   0   13213652            0    143.9gb        143.9gb      143.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000109                           qiIB_ZzVT3Ka7ZJPxEL0_g   5   0   12558400            0    157.9gb        157.9gb      157.9gb
green  open   .internal.alerts-observability.slo.alerts-default-000001           OFJmF-51ScKfsoDxGGYrjQ   1   1          0            0       500b           250b         250b
green  open   .ds-logs-webcrawl-prod-2024.05.10-000090                           JNhFkK40ThG3hMtyoVrnug   5   0    9997510            0    128.3gb        128.3gb      128.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000091                           mMOHjAbfTi6bx9mUqnn_sw   5   0    9833600            0    126.7gb        126.7gb      126.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000092                           CHNJ6EoITVmhxt9dEWppwA   5   0   12152000            0    154.1gb        154.1gb      154.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000093                           Ev2s_aI0RZ-cfwafmBrr2A   5   0    8836800            0    119.7gb        119.7gb      119.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000094                           cptGQIQRTZy2jXGFF6nV0g   5   0    9292800            0    122.7gb        122.7gb      122.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000095                           B8aku1LPTk6es2Pg_c6kAA   5   0    9582400            0    123.8gb        123.8gb      123.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000096                           uxzaVTMBSlCtAbSak4cDXA   5   0    9276800            0    123.2gb        123.2gb      123.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000097                           HybIiiCJS0mHpdhEG4PIow   5   0    8843200            0    115.4gb        115.4gb      115.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000098                           cqLayJDjRvmb324OqThB_w   5   0   10825600            0    137.8gb        137.8gb      137.8gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000099                           PuXD7IUcQyCFXqX6CYtxoQ   5   0   12275200            0    158.6gb        158.6gb      158.6gb
green  open   .ds-logs-webcrawl-test-2024.05.07-000001                           23xoiyLsQfuEIHXjLvzpCg   5   0      34199            0    644.5mb        644.5mb      644.5mb
green  open   .ds-webcrawl-filestream-prod-2024.04.28-000008                     YrEJzfdWQwC1Vi-dhKJR2Q  20   0    6443625            0    101.7gb        101.7gb      101.7gb
green  open   .ds-webcrawl-filestream-prod-2024.04.28-000005                     NBqLm04ESSyfCmxugkEpOA  20   0    6431706            0     99.9gb         99.9gb       99.9gb
green  open   .ds-webcrawl-filestream-prod-2024.04.28-000004                     TM9QJ5WhRUanimp5jOIn1g  20   0    6525570            0    102.4gb        102.4gb      102.4gb
green  open   .ds-webcrawl-filestream-prod-2024.04.28-000007                     GswwiGlxT7qe-bKnzs5s4A  20   0    6177311            0     98.8gb         98.8gb       98.8gb
green  open   .ds-webcrawl-filestream-prod-2024.04.28-000006                     SNxAL7qOQl-fnjbUoWGIpA  20   0    6323494            0       99gb           99gb         99gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000113                           o9Fq--nsS_25WsexbZbCiA   5   0    9038400            0    118.3gb        118.3gb      118.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000114                           EEaBS4hPTx6R_lErSspS7w   5   0    9016000            0      128gb          128gb        128gb
red    open   .ds-logs-webcrawl-prod-2024.05.10-000115                           lthhC0FoR9ioFRCYr53OWQ   5   0    6890335            0     85.1gb         85.1gb       85.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000112                           -SjGnIqlS-yPlY-McdxWWg   5   0    9118400            0      121gb          121gb        121gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000111                           MCt2uDzeSym1m6qPt2VGJg   5   0    8958400            0    117.2gb        117.2gb      117.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.10-000110                           PQ6DhJYeTqK_0iqabQ2E7g   5   0   12969600            0      175gb          175gb        175gb
green  open   .internal.alerts-stack.alerts-default-000001                       6useGXHdQnKW4zxY4u4vWQ   1   1          0            0       500b           250b         250b
green  open   .kibana-observability-ai-assistant-conversations-000001            nMxDy_EoRC-xfDAJ051rXw   1   1          0            0       500b           250b         250b
green  open   .ds-logs-videos-test-2024.05.07-000001                             PsyFC2TXSge_ZdAupqtS4w   5   0   12032605            0      6.1gb          6.1gb        6.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000001                           x5c8namfQC-eexaM_DnGbA   5   0   10633600            0    135.7gb        135.7gb      135.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000002                           psTMEHubTPiiQsDXfDTy1Q   5   0    9284800            0    120.7gb        120.7gb      120.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000003                           DO_EnXwVR4CsYZdVINicFg   5   0    9270400            0    118.5gb        118.5gb      118.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000004                           QnGmIGg5TgapfBWAfGQ6Zw   5   0   10192000            0      139gb          139gb        139gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000005                           t2lOc3HWRAKtWuMehoepAQ   5   0    8950036            0    118.6gb        118.6gb      118.6gb
green  open   .ds-logs-webcrawl-prod-2024.05.07-000006                           lIwkP2ylQgiNd3nGd61fbQ   5   0    9771200            0    125.3gb        125.3gb      125.3gb
green  open   .internal.alerts-transform.health.alerts-default-000001            4bgZ9f83ROi9Nmhyg7LRJA   1   1          0            0       500b           250b         250b
green  open   .internal.alerts-ml.anomaly-detection-health.alerts-default-000001 NOvD4WcpQry1K6VXc2ZJUg   1   1          0            0       500b           250b         250b
green  open   .ds-logs-videos-prod-2024.05.07-000001                             49FnwjyHQVi1MVt4pAg5zA   5   0   15018015            0     41.3gb         41.3gb       41.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000116                           3kAcI8gATZaLeZf7DNzQmQ   5   0   10985680            0    140.1gb        140.1gb      140.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000117                           -d_QHUyTQvOHbHPlcQPdgQ   5   0   11885699            0    142.7gb        142.7gb      142.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000118                           lH50X98hQmStGWL7tDiXmQ   5   0   12968000            0    156.9gb        156.9gb      156.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000119                           -BAdLGYtRcCLQcuhhsqHOw   5   0   11664000            0    141.2gb        141.2gb      141.2gb
green  open   .ds-logs-videos-prod-2024.05.07-000002                             d1uhfmiwRLisO3XC0qD8Aw   5   0    3026545            0      7.9gb          7.9gb        7.9gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000011                     AxFlSx4jSoWZlWbb7ljjqw  20   0    6279706            0     96.2gb         96.2gb       96.2gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000012                     2S0t8qLKQx6n8sZYedTBlg  20   0    1445482            0     22.5gb         22.5gb       22.5gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000013                     W4qUivEfTxa_GcqR-P_kMA  20   0    6318045            0       92gb           92gb         92gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000014                     cP0RlpD3Rn2D7xqaNHxRdg  20   0    6547355            0     96.9gb         96.9gb       96.9gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000015                     EhZQwbklT_mki0asId9CVA  20   0    6582742            0     97.2gb         97.2gb       97.2gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000016                     kbs1zw1lTzWp4X4yreDH2Q  20   0    6416248            0     96.1gb         96.1gb       96.1gb
green  open   .ds-webcrawl-filestream-prod-2024.04.29-000017                     p_7EY5D7QRSMS4bW6ium1g  20   0    6314201            0     94.6gb         94.6gb       94.6gb
green  open   .internal.alerts-observability.logs.alerts-default-000001          9hGOwCgVSjmaIeY7rRnfqA   1   1          0            0       500b           250b         250b
green  open   .internal.alerts-observability.threshold.alerts-default-000001     STIkNMO0SRuOLpTnnK4cZw   1   1          0            0       500b           250b         250b
green  open   .kibana-observability-ai-assistant-kb-000001                       exLCF-EjQcSWYXiMM1RjHw   1   1          0            0       500b           250b         250b
green  open   .ds-webcrawl-filestream-prod-2024.05.01-000023                     cVNizrvPR8Wfx6HJ_5iJaw  20   0    6297258            0     95.3gb         95.3gb       95.3gb
green  open   .ds-webcrawl-filestream-prod-2024.04.30-000019                     MXgdBiT3Q6iR8L7ZVcu_3Q  20   0    6226191            0       92gb           92gb         92gb
green  open   .ds-webcrawl-filestream-prod-2024.05.01-000025                     jsfdGm8aRpi7tV2R0BG28g  20   0    1011067      2838029     33.7gb         33.7gb       33.7gb
green  open   .ds-webcrawl-filestream-prod-2024.05.01-000026                     uDvSabQNReqp3-GbHioX-Q  20   0          0            0      5.4kb          5.4kb        5.4kb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000121                           XtnF58lqTj61oW9PPRzy1A   5   0   10217600            0    134.9gb        134.9gb      134.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000007                           t14n7GoEStiwwBFg1I8LqA   5   0    9769534            0    122.3gb        122.3gb      122.3gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000008                           -gakTZuqSwaCbO-bSnGU4g   5   0    9670400            0    118.1gb        118.1gb      118.1gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000009                           3CkBX7jyRRycnYBlnO4NSA   5   0   10761600            0    140.5gb        140.5gb      140.5gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000122                           8JgV8-Z3Rcm4HBBvv-2JUg   5   0   12832000            0    173.9gb        173.9gb      173.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000123                           qYzpakobQfK6_MnV99312Q   5   0    9811200            0    133.7gb        133.7gb      133.7gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000124                           9KqM4gsFSpa3znOX8CpKxA   5   0   11067200            0    150.9gb        150.9gb      150.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000125                           LpOAcquzTVeA_i7kCK0RHA   5   0   10308800            0    143.9gb        143.9gb      143.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000126                           lzgq3OvxRqqMSrbaUQOx7A   5   0   10678400            0    148.4gb        148.4gb      148.4gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000127                           BEu4bh3UR72YUR7EWE_FrA   5   0    7495070            0    141.9gb        141.9gb      141.9gb
green  open   .ds-logs-webcrawl-prod-2024.05.11-000120                           oCkKd8RFSrSSc-Onxk3nlg   5   0   10649600            0    137.2gb        137.2gb      137.2gb
green  open   .ds-twitter-filestream-test-2024.04.26-000001                      Y_7a2UGVTiyZSBU_po1VFw   1   1          0            0       500b           250b         250b
green  open   .ds-webcrawl-filestream-prod-2024.04.30-000020                     78raC9egRTWl9I5rqe-MDw  20   0    6364440            0     94.4gb         94.4gb       94.4gb
green  open   .ds-webcrawl-filestream-prod-2024.04.30-000021                     EMuuaCxHQxePerZW8Inztw  20   0    6414297            0     96.1gb         96.1gb       96.1gb
green  open   .ds-webcrawl-filestream-prod-2024.04.30-000022                     H0dJMb2aSbaQB8ohenj-PQ  20   0    6304382            0       95gb           95gb         95gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000010                           QTnjel69SLmB5Rb9rVoY_Q   5   0   11707200            0    151.2gb        151.2gb      151.2gb
green  open   .ds-logs-webcrawl-prod-2024.05.08-000011                           -q5Av7HDQ9SXeBBjIZi1Og   5   0   11043200            0      138gb          138gb        138gb

also providing the template i used for the webcrawl index which is main concern for me

{
  "priority": 1500,
  "template": {
    "settings": {
      "index": {
        "lifecycle": {
          "name": "webcrawl-fs-policy"
        },
        "mapping": {
          "nested_fields": {
            "limit": "500"
          },
          "depth": {
            "limit": "50"
          },
          "field_name_length": {
            "limit": "1000"
          },
          "total_fields": {
            "limit": "20000"
          }
        },
        "refresh_interval": "60s",
        "number_of_shards": "5",
        "max_docvalue_fields_search": "500",
        "default_pipeline": "webcrawl-pipeline",
        "analysis": {
          "filter": {
            "my_shingle_filter": {
              "max_shingle_size": "2",
              "min_shingle_size": "2",
              "type": "shingle"
            },
            "my_standard_filter": {
              "type": "stop"
            }
          },
          "analyzer": {
            "my_shingle_analyzer": {
              "filter": [
                "lowercase",
                "my_shingle_filter",
                "trim",
                "stemmer"
              ],
              "tokenizer": "whitespace"
            },
            "my_standard_analyzer": {
              "filter": [
                "lowercase",
                "my_standard_filter",
                "trim",
                "stemmer"
              ],
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "0"
      }
    },
    "mappings": {
      "_routing": {
        "required": false
      },
      "numeric_detection": false,
      "dynamic_date_formats": [
        "strict_date_optional_time",
        "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
      ],
      "_source": {
        "excludes": [],
        "includes": [],
        "enabled": true
      },
      "dynamic": true,
      "dynamic_templates": [],
      "date_detection": true,
      "properties": {
        "Envelope.Payload-Metadata.HTTP-Response-Metadata.Headers": {
          "type": "flattened"
        },
        "cn": {
          "eager_global_ordinals": false,
          "index_phrases": false,
          "search_quote_analyzer": "my_shingle_analyzer",
          "fielddata_frequency_filter": {
            "min": 0.01,
            "max": 1,
            "min_segment_size": 50
          },
          "fielddata": true,
          "norms": true,
          "analyzer": "my_shingle_analyzer",
          "index": true,
          "store": false,
          "type": "text",
          "index_options": "positions"
        },
        "ht": {
          "eager_global_ordinals": false,
          "index_phrases": false,
          "search_quote_analyzer": "my_standard_analyzer",
          "fielddata": false,
          "norms": true,
          "analyzer": "my_standard_analyzer",
          "index": true,
          "store": false,
          "type": "text",
          "index_options": "positions"
        },
        "wd": {
          "index": true,
          "ignore_malformed": false,
          "store": false,
          "type": "date",
          "doc_values": true
        },
        "wt": {
          "eager_global_ordinals": false,
          "index_phrases": false,
          "search_quote_analyzer": "my_standard_analyzer",
          "fielddata": false,
          "norms": true,
          "analyzer": "my_standard_analyzer",
          "index": true,
          "store": false,
          "type": "text",
          "index_options": "positions"
        },
        "wu": {
          "dynamic": true,
          "type": "object",
          "enabled": true,
          "properties": {
            "original": {
              "eager_global_ordinals": false,
              "index_phrases": false,
              "search_quote_analyzer": "my_standard_analyzer",
              "fielddata": false,
              "norms": true,
              "analyzer": "my_standard_analyzer",
              "index": true,
              "store": false,
              "type": "text",
              "index_options": "positions"
            },
            "domain": {
              "eager_global_ordinals": false,
              "index_phrases": false,
              "search_quote_analyzer": "my_standard_analyzer",
              "fielddata": false,
              "norms": true,
              "analyzer": "my_standard_analyzer",
              "index": true,
              "store": false,
              "type": "text",
              "index_options": "positions"
            }
          }
        }
      }
    }
  },
  "index_patterns": [
    "logs-webcrawl-*",
    "logs-webcrawl-prod*"
  ],
  "data_stream": {
    "hidden": false,
    "allow_custom_routing": false
  },
  "composed_of": [],
  "allow_auto_create": false
}

It looks like some nodes are experiencing significantly higher load than the others. Do these node hold more indices/shards being heavily indxed into than the others?

Do you have 125+ different data streams that you are indexing into? Do these all receive similar data volumes? Are these all configured with 5 primary shards?

Hi Christian ,

please find the response for your ask ,

  1. No , i checked individual node and allocated shards in each node , all are equally allocated , say 4-5 shards per node. i am not aware why the load is high on respective servers . My assumption is these nodes are having the index where the shard is being written into

  2. no i am having 2-3 data steams only , in one of them is logs-webcrawl-prod which is main one , this data stream have the huge number of documents , currently i ingested 17TB of data with 5 primary 0 replica and 20GB rollover condition in ILM so each index will have 100 GB each as 5 primary will have 20GB

Use the cat shards API to identify which nodes holds shards actively being written to and see if this correlates with increased load. When you run a serach the nodes with increased load will need to also execute it and if they have the same number of nodes as the other nodes, these could very well be your bottleneck.

If there is a correlation I would recommendincreasing the number of primary shards as this will spread out the indexing load better. The indices will rollover a bit less frequently based in primary shard size, but that is not a problem.

Another approach could be to split the nodes into sets of hot and warm nodes where you move indices off the hot nodes as soon as they have rolled over. This would leave the hot nodes with a lot less data to serve queries for and may also help.

Hi christian ,

I agree with your points , and i will look into the load hitting the shards when the query executes. it will take some time for me to analyze this. Also , the load is not persists for longer time , its changing for every refresh. also all other nodes are not having that much load. exactly how this will resolve my response time ( when indexing data and not indexing data )

for other points , i can not maintain the warm nodes as my data is web crawl data , user may hit any data at any time so everything should be in HOT nodes only.

and as for the primary shards count , for 150 machines i am maintaining 5 primaries and 0 replicas for now , what is the best recommendation to improve search performance , please suggest
also suggest on total index size , max how much size we can maintain for best search results.

i am aiming for 1-2 sec response time

The hot and warm nodes would have the same hardware specification they have now, just handle different tasks and host different volumes of data. If a node in the warm tier does not perform any indexing I would expect it to be able to serve queries against a larger data volume than you currently have. The warm does not necessarily need to be slower.

In many deployments of hot/warm architecture the warm nodes end up having inferior hardware and then searches can be slower, but that is not what I am suggesting here.

This will depend on your dat, mappings and queries so is something you need to establish yourself through testing and benchmarking.

Hi Christian ,

Thanks for the suggestion and it make sense to have hot and warm nodes even if the data is frequently searchable.

Confirm me one more speculation of mine ,

in this mean time i noticed the memory issue like for the 64 GB ram machine i havent maintained any JVM options so that i assumed that the Elasticsearch will consume half of the RAM , which is 32 GB and the bellow is the consumption for one of my Data node

xyz@abc:~$ free -h
               total        used        free      shared  buff/cache   available
Mem:            62Gi        39Gi       537Mi       1.0Mi        23Gi        22Gi
Swap:             0B          0B          0B

the dedicated use of ram is 39GB is expected as elastic will consume around 32 but buffer here is consuming 23 GB and this is because of the elasticsearch only. because if i stop elasticsearch the buffer is not filling.
I read some documents and modified the GC collection configuration to clear GC faster but no use.

My ask here is

  1. Does elasticsearch have any possible reason to cause machine Cache to fill ??
    if YES , how we can control or maintain it

  2. As per the previous conversation , you know i am using all my VMs (147 VMs ) as Data nodes ( Not separated as HOT and WARM) , if i segregate the VMs to HOT and WARM , the behavior of cluster will change as the write operations will be limited to HOT and search (phrase queries) will dedicated to both HOT and WARM (here WARM expected to have less CPU , RAM consumption ) so can we expect to search queries faster ?

@Christian_Dahlqvist , @leandrojmp , please help me on above ask. Apologies for delay in our conversation

sharing elasticsearch yaml and JVM options yaml


# ======================== Elasticsearch Configuration =========================
cluster.name: ELK-CLUSTER
# Use a descriptive name for the node:
node.name: abc
node.roles: [ data , data_hot, ingest]
# ----------------------------------- Paths ------------------------------------
# Path to directory where to store the data (separate multiple locations by comma):
path.data: /data/elasticsearch
# Path to log files:
path.logs: /var/log/elasticsearch
network.host: IPOfServer
http.port: 9200
http.max_content_length: 2147483647b
#
# --------------------------------- Discovery ----------------------------------
discovery.seed_providers: file

# Enable security features
xpack.security.enabled: true
xpack.security.enrollment.enabled: true
# Enable encryption for HTTP API client connections, such as Kibana, Logstash, and Agents
xpack.security.http.ssl:
  enabled: false
    #keystore.path: certs/http.p12
xpack.security.transport.ssl:
  enabled: true
  verification_mode: certificate
  keystore.path: /etc/elasticsearch/certs/elastic-certificates.p12
  truststore.path: /etc/elasticsearch/certs/elastic-stack-ca.p12
http.host: 0.0.0.0
transport.host: 0.0.0.0
#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

JVM options - only filecount=5,filesize=20m changed from default to clear GC faster

################################################################

-XX:+UseG1GC

## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}

# Leverages accelerated vector hardware instructions; removing this may
# result in less optimal vector performance
20-:--add-modules=jdk.incubator.vector

## heap dumps

# generate a heap dump when an allocation from the Java heap fails; heap dumps
# are created in the working directory of the JVM unless an alternative path is
# specified
-XX:+HeapDumpOnOutOfMemoryError

# exit right after heap dump on out of memory error
-XX:+ExitOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## GC logging
--Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,level,pid,tags:filecount=5,filesize=20m

Elasticsearch assumes that it has access to all available RAM on the host and the recommendation to set the heap to a maximum of 50% relies on this. In addition to the heap Elasticsearch also stores some data off-heap. It also heavily relies on the operating system page cache for performance, so on a node holding a good amount of data it is normal to see all memory being used. This is optimal and normal, so nothing to try and control.

If you are running other services on the same host (not recommended) you will need to reduce the heap size manually.

Warm nodes may have less CPU and RAM, but that is not always the case. As they hold a large amount of data you will need to see how much resources they require to be able to serve queries within your SLAs.

Thanks @Christian_Dahlqvist ,

i am stuck with this requirement , and not understand how to optimize the performance.

to summarize things,

Requirement and Specifications are

1. heavy ingestion all the time using filebeat for ingestion 
2. currently have 150 nodes , 3 machines dedicated to master , master eligible , election nodes and 147 nodes are data and ingest nodes 
3. not using any other functionality like transforms , APM , dashboarding , Rules 
4. Basic License platform , On-prim servers with 64 GB Ram , 8-Core and 3.5 TB drive space in each machine 
5. targeting to ingest 1-2 PetaByte of data in coming days , just started and for now i have 50 TB of data in all the nodes combine 
6. the main purpose of the data is to have a search engine like platform , so the target is to serve the match phrase and match queries on all the content 
7. Number of primaries is 5 with rollover condition with 20gb , replica is 0
so for each Data Stream backend indices the size is 100gb with 5 primaries each of 20gb 

Current Issues

1. current performance is , 1 st hit is taking 30-40 sec and 2nd hit is taking 1-2 sec as i enabled request_chache= true in GET query , without this param all the hits taking 30sec + time
2. Dedicated RAM is understandable but the Cache in every server is filling very fast  (with in 10-15 min of starting elasticsearch)
this unavailability of RAM is causing some issues which i am not able to relate if this really because of this ( like some python Scripts are getting killed in between , VMs are getting disconnected)

Solutions as per our conversation and i am planning and my doubts

**for the 1st in Current Issues**

1. as we dont hold replicas in our current design to make searching faster i made primaries 5 instead of 1. 
2. currently all the search queries are pointed to master , master elgible nodes (3 nodes) but as per the suggestion i will change this to all data nodes , hoping to have bit better results after this.
but i have a concern here .. if we point the phrase search queries to all data nodes , i will create the huge number of tasks internally on each node and , where ever the data is not there in the node that task will continue to run and cause more memory consumption ? or if the request serves by any node will all the other search phrase tasks will terminate ? 
3. is it good choice to build a load balancer (Nginx or Apache Webserver) before the elasticsearch to serve the incoming search API requests ? how much do you think that it will help in this case.



**for the 2nd in Current Issues**

1. As per your suggestion , will stop using the VMs for other services than the elasticsearch. But even then , filling up the buffering ram still case any issue for searching ? 
2. i also tried to reduce JVM to 20GB for few servers instead of default , but then as well buffering is filling but slower than before
3. as per your suggestion , if its expected that elasticsearch stores some off-heap data , increasing the RAM can fix the issue ? but then again how much ram we can increase on what basis ? as per the rate its increasing (filling up 20GB ram just in 15 min), its not enough even if we increase the size to 100gb RAM 
4. End of the day i am not concern about the Size of the RAM and buffer , i am OK it , but my searching is very slow so i am concern about the RAM of VM as well . 
if the search speed is not depend on Available RAM then we can discuss on Point 1 on Current issues alone 

Having read through the thread again I have a few comments.

If you always query the full data set and have no replicas in place it means that every query will need to query every shard. That is not going to be very efficient at scale and I am not sure you will be able to meet your latency targets. I would also not expect you to be able to serve very many concurrent queries as each query will be quite expensive.

Another side effect is that your queries are likely to experience partial failures as soon as there is any issue with a node or the cluster as a whole given that you do not have any replicas to fall back on.

If you now have 50TB in the cluster (less than 400GB per data node) and already are experiencing performance issues I do not see how filling the disk with data (over 500TB) and querying using this approach will work.

With just 50TB of data every query is targeting 2500 shards. With the cluster full it would be around 25000 shards.

I think you should look to have larger shards, perhaps 50GB or so and make sure they have been optimised and forcemerged down to a single segment. Not sure it will help get you where you want to be though.

As you are querying a large number of shards you may want to experiment with increasing max_concurrent_shard_requests for your requests in order to do more work in parallel, especially if you do not see CPU or disk I/O being a bottleneck. Increase this gradually and see what imapct it has (if any). Do not go to a very large value immediately as that can cause problems.

It looks like you are using a lot of nested levels which can add overhead. You may want to review and possibly optimise how your data is structured.

It looks like you make use of fuzziness, which can add quite a bit of overhead and is fairly expensive. Try running queries without fuzziness and see what impact, if any, that has on latency.

If you have large documents and text fields, highlighting can also add a fair bit of overhead. Try not using highlighting and see what impact it has.

That is good. When operating optimally Elasticsearch should use all memory on the machine so this is not something to be concerned about. I would be much more concerned if this was not the case.

I do not think the RAM usage of heap size settings is the cause of your problems and I do not think tinkering with this once you have followed my guidelines will make much difference to performance. I would instead recommend looking into disk stats when you are running a query, e.g. using iostat -x, to see how your storage is behaving. As all shards are involved in every query you can run it on any node.

Hi @Christian_Dahlqvist ,

Thanks for your valuable feedback and suggestions , this could really help me. Much appreciated

i will follow your suggestion and make bellow changes , just letting you know and add if anything else if i forgot

  1. to Reduce the shard count , increasing the size of shard for rollover , suggested ~50GB with 5 primaries , that makes each index of 250GB .
  2. Not to run anything in any VM other than Elasticsearch
  3. Pointing the Search APIs to all Data nodes instead of only Master , and placing a load balancer to handle incoming search queries is also considered.
  4. considered to test on max_concurrent_shard_requests setting and increase it gradually

To answer your few points ,

  1. Nested_fields.limit and depth.limit i set bit high because the incoming data sometime have very huge array of nested field causing file to truncate or ingestion issues.
  2. Fuzzy search , and highlight options we are not using for every query but wherever we are using that's needed for some functionality. but i will follow your suggestion and only use wherever absolutely needed.
  3. your bellow comments

what are your suggestions if the final goal is like performing full text fuzzy phrase search queries coming like per min 30-50 hits on 1 PetaByte of data. i enabled shingle and whitespace analyzers for the fields which we are targeting in search queries.

and for

i have already faced this issue and applied "search.default_allow_partial_results": "true" for fixing discover and for incoming APIs allow_partial_search_results=true param fixed the issue. this i know better to maintain minimum 1 replica for any kind of data in elasticsearch. but the problem is the space , here we are targeting to store 1PB of data in elasticsearch and if we maintain replica its doubling the cost of storage. unfortunately this is the solution we are going with. I am planing to optimize the best with this configuration and targeting to retrive the search results as fast as possible. ideal case is 1-2 sec response but i am trying my best here.

As you asked , bellow is the iostat response for one of the machine with idle case , no ingestion and nothing happening right now.

abc@xyz:~# iostat -x
Linux 5.15.0-97-generic (xyz)         06/10/2024      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.79    0.00    0.58    0.06    0.00   98.57

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
dm-0             7.74    675.97     0.00   0.00    1.12    87.35    5.73   1417.10     0.00   0.00    1.26   247.49    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.02   0.79
dm-1            27.50    214.00     0.00   0.00    0.28     7.78    6.13     31.31     0.00   0.00    0.79     5.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.01   0.32
loop0            0.00      0.01     0.00   0.00    0.86    37.24    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop1            0.00      0.01     0.00   0.00    0.57    34.36    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.41    11.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop3            0.00      0.02     0.00   0.00    0.79    40.99    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop4            0.00      0.00     0.00   0.00    0.62    14.66    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop5            0.00      0.01     0.00   0.00    0.18    39.10    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop6            0.00      0.00     0.00   0.00    0.39    12.69    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop7            0.00      0.00     0.00   0.00    0.50    16.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              7.30    214.12    20.20  73.47    0.24    29.35    4.82     31.44     1.32  21.49    0.44     6.53    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.32
sdb              7.74    675.98     0.03   0.39    1.12    87.34    3.47   1417.10     3.26  48.41    3.44   407.98    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.02   0.79
sr0              0.00      0.00     0.00   0.00    0.00     0.23    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


once again thanks for taking your time and supporting me on this , really appreciated @Christian_Dahlqvist , looking forward for your comments on above.

I would like to see iostat output from when you are running a query that is taking long to complete.

I do not have any further suggestions.

sure , thanks for confirmation and let me make all the changes as per the suggestions. it may take a while , i will get back with changes and iostats output once everything in place.

Thanks a lot @Christian_Dahlqvist