Search is very slow in multilingual setup

Hello,

We currently have english, french, spanish and 7 other languages supported in search and search has become too slow now, even sometimes it timeout, one of bottle necks i think is to do with multilingual support , every document or record stored has has five fields and and each of these fields is stored in 10 languages using analysers.

When a request is send for search, for each doc it has now 50+ fields just for matching query string against analysed fields.

how do improve or fix this situation?

Welcome!

What is the version?
Could you share a typical document, query and the mapping?

What is the size of the index and shards?

What is the hardware specification of the node/cluster?

{
  "_index": "tasks_v1",
  "_id": "42",
  "_routing": "101",
  "_source": {
    "title": "Implement payment gateway",
    "explanation": "Add Stripe integration for subscription payments",
    "display_id": "TASK-42",

    "remarks": [
      {
        "data": "This is high priority",
      }
    ],
    "todo_list": [
      {
        "data": "Research payment providers"
      },
      {
        "data": "Setup test environment"
      }
    ],

  }
}

sample document structure, This has 10-12 other fields but i am adding only relevant fields

and mapping looks

{
  "mappings": {
    "_routing": {
      "required": true
    },
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text",
        "term_vector": "with_positions_offsets",
        "fields": {
          "english": {
            "type": "text",
            "analyzer": "autocomplete_english",
            "search_analyzer": "english",
            "term_vector": "with_positions_offsets"
          },
          "english_exact": {
            "type": "text",
            "analyzer": "exact",
            "term_vector": "with_positions_offsets"
          },


          // ... dutch, swedish, norwegian, danish other languages follow same pattern
        }
      },
      "explanation": {
        // Same multilingual structure as "title"
      },
      "remarks": {
        "type": "nested",
        "properties": {

          "text": {
            // Same multilingual structure as "title"
          },
        }
      },
      "todo_list": {
        "type": "nested",
        "properties": {
          "text": {
            // Same multilingual structure as "title"
          }
        }
      },
      "doc_id": {"type": "keyword"},

      }
    }
  }
}
{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "le projet",
          "fields": [
            "title.*"  // Searches ALL 16 fields (8 languages × 2 variants)
          ],
          "quote_field_suffix": "_exact"
        }
      },
      "filter": [
        {"terms": {"teams.id": [101, 102, 103]}}
      ]
    }
  }
}

Which version of Elasticsearch are you using?

version is 8.0?

Can't you try to detect the language the user is using when running the query? Something like langdetect?
Then only try the right language fields?

Could you upgrade also your cluster if you are running 8.0... 8.19.7 is the latest 8.x version.
Or better may be to 9.2.1.

Also you can try to profile your query with "profile": true, so you can have a better idea of what is happening...

Could you also share the very first 10 lines of the search response? Specifically what is the took value.

i am already doing a poc with langdetect, but, on the internet I have read that langdetect might not work well if the query string is small like one or two words and most of our queries are one or two words.
What do you think of using langdetect? Are there any other battle tested solutions or options I can look into?

I will also share took value from "profile": true, ?

You never answered my questions around index size and shard count. The reason I asked about this is that in older versions search requests were run single-threaded against each shard, although multiple shards can be processed in parallel as long as the host has enough resources. If you have a single primary shard and it is getting reasonably large it might be worthwhile trying to increase the number of primary shards using the split index API and see if this makes a difference. This will allow greater concurrency but potentially at the same time add more overhead, so it is not a given it will help.

Share the took when profile is not set please.