What are the search optimizations possible without scoring?

Hi guys,

I have 30 terabytes of Elasticsearch data spread across 11 nodes. Each index is separated by month and has 5 primary shards with 5 replica shards. Some shards are big (over 200G) and some are small (less than 10G). However, I want to search quickly across all data (right now, it takes more than a minute to finish and the loads of all nodes would be maximized by a single 6-word search query). Another search on top of it would make the time 2-3 minutes.

I was thinking to have 12 nodes and each 2 nodes contain a node that has one-year worth of data for primary and another node for the replicas. I'm hoping that way, the data can be cached better for searches and there would be less communication necessary between nodes. By default, it search all data but it is also possible to search for just last year or last two years.

However, my nodes don't have very big memory (around 32G per node, half of it is for heap).
Do you guys think this is a good approach? If not, what else can I do to improve the search time?

Thanks in advance!

What kind of queries are you running? Have these been optimised? Are there any patterns in your queries, e.g. they typically search per user, that can be used to optimise access patterns? Given the amount of data per node storage performance is likely going to be very important. Are you using locally attached SSDs?

Hi Christian, thanks for the response!

I'm running Indices query and the inner two queries are query_string with about 12 and 15 fields for two groups of indices in both inner queries respectively.

They search for keywords that would appear in those fields. The nodes are powered by VMware on SSDs (dedicated cloud servers). Each node has approximately a little more than 3TB of data.

I'm not sure if the queries have been optimized by generally it looks like this:

{
  "query": {
    "indices": {
      "indices": [
        "a*"
      ],
      "query": {
        "bool": {
          "filter": [
            {
              "range": {
                "timestamp": {
                  "gte": "1990-01-01T00:00:00"
                }
              }
            },
            {
              "query": {
                "query_string": {
                  "query": "some random query by user",
                  "lenient": true,
                  "analyze_wildcard": false,
                  "fields": [
                    "o",
                    "p",
                    "q",
                    "r",
                    "s",
                    "t",
                    "u",
                    "v",
                    "w"
                  ]
                }
              }
            }
          ]
        }
      },
      "no_match_query": {
        "bool": {
          "filter": [
            {
              "range": {
                "created_at": {
                  "gte": 631152000
                }
              }
            },
            {
              "query": {
                "query_string": {
                  "query": "some random query by user",
                  "lenient": true,
                  "analyze_wildcard": true,
                  "fields": [
                    "a",
                    "b",
                    "c",
                    "d",
                    "e",
                    "f",
                    "g",
                    "h",
                    "i",
                    "j",
                    "k",
                    "l",
                    "m",
                    "n"
                  ],
                  "analyzer": "standard"
                }
              }
            }
          ]
        }
      }
    }
  }
}

There is no pattern in the search. We just have find keywords within the documents' fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.