Search is very slow in multilingual setup

manohar_chowdary · November 26, 2025, 5:23am

Hello,

We currently have english, french, spanish and 7 other languages supported in search and search has become too slow now, even sometimes it timeout, one of bottle necks i think is to do with multilingual support , every document or record stored has has five fields and and each of these fields is stored in 10 languages using analysers.

When a request is send for search, for each doc it has now 50+ fields just for matching query string against analysed fields.

how do improve or fix this situation?

dadoonet · November 26, 2025, 6:26am

Welcome!

What is the version?
Could you share a typical document, query and the mapping?

Christian_Dahlqvist · November 26, 2025, 7:20am

What is the size of the index and shards?

What is the hardware specification of the node/cluster?

manohar_chowdary · November 26, 2025, 7:23am

{
  "_index": "tasks_v1",
  "_id": "42",
  "_routing": "101",
  "_source": {
    "title": "Implement payment gateway",
    "explanation": "Add Stripe integration for subscription payments",
    "display_id": "TASK-42",

    "remarks": [
      {
        "data": "This is high priority",
      }
    ],
    "todo_list": [
      {
        "data": "Research payment providers"
      },
      {
        "data": "Setup test environment"
      }
    ],

  }
}

sample document structure, This has 10-12 other fields but i am adding only relevant fields

and mapping looks

{
  "mappings": {
    "_routing": {
      "required": true
    },
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text",
        "term_vector": "with_positions_offsets",
        "fields": {
          "english": {
            "type": "text",
            "analyzer": "autocomplete_english",
            "search_analyzer": "english",
            "term_vector": "with_positions_offsets"
          },
          "english_exact": {
            "type": "text",
            "analyzer": "exact",
            "term_vector": "with_positions_offsets"
          },


          // ... dutch, swedish, norwegian, danish other languages follow same pattern
        }
      },
      "explanation": {
        // Same multilingual structure as "title"
      },
      "remarks": {
        "type": "nested",
        "properties": {

          "text": {
            // Same multilingual structure as "title"
          },
        }
      },
      "todo_list": {
        "type": "nested",
        "properties": {
          "text": {
            // Same multilingual structure as "title"
          }
        }
      },
      "doc_id": {"type": "keyword"},

      }
    }
  }
}

{
  "query": {
    "bool": {
      "must": {
        "simple_query_string": {
          "query": "le projet",
          "fields": [
            "title.*"  // Searches ALL 16 fields (8 languages × 2 variants)
          ],
          "quote_field_suffix": "_exact"
        }
      },
      "filter": [
        {"terms": {"teams.id": [101, 102, 103]}}
      ]
    }
  }
}

Christian_Dahlqvist · November 26, 2025, 7:27am

Which version of Elasticsearch are you using?

manohar_chowdary · November 26, 2025, 7:52am

version is 8.0?

dadoonet · November 26, 2025, 8:18am

Can't you try to detect the language the user is using when running the query? Something like langdetect?
Then only try the right language fields?

Could you upgrade also your cluster if you are running 8.0... 8.19.7 is the latest 8.x version.
Or better may be to 9.2.1.

Also you can try to profile your query with "profile": true, so you can have a better idea of what is happening...

Could you also share the very first 10 lines of the search response? Specifically what is the took value.

manohar_chowdary · November 26, 2025, 8:38am

i am already doing a poc with langdetect, but, on the internet I have read that langdetect might not work well if the query string is small like one or two words and most of our queries are one or two words.
What do you think of using langdetect? Are there any other battle tested solutions or options I can look into?

I will also share took value from "profile": true, ?

Christian_Dahlqvist · November 26, 2025, 8:59am

You never answered my questions around index size and shard count. The reason I asked about this is that in older versions search requests were run single-threaded against each shard, although multiple shards can be processed in parallel as long as the host has enough resources. If you have a single primary shard and it is getting reasonably large it might be worthwhile trying to increase the number of primary shards using the split index API and see if this makes a difference. This will allow greater concurrency but potentially at the same time add more overhead, so it is not a given it will help.

dadoonet · November 26, 2025, 10:53am

Share the took when profile is not set please.

Topic		Replies	Views
Implementation of multi lingual search Elasticsearch	3	372	July 6, 2017
Multilingual Search in Elastic Search Elasticsearch	4	1409	July 6, 2017
Field per language and total number of fields performance concern Elasticsearch	1	603	July 5, 2017
Mult-language searchable in one field Elasticsearch	9	452	July 6, 2017
How to query with multiple languages (field per language approach) Elasticsearch	1	795	July 6, 2017

Search is very slow in multilingual setup

Related topics