Compound words handling

How can I get ES to search for the words "data base" and "database" in the index when the query word is "database" or "data base"?
another example : when the user queries the word "clean up" or "cleanup", ES should search for both "clean up" and "cleanup".

I tried decompounding using the following code :

  index= "zentest",
  body= {
    "settings": {
      "analysis": {
        "analyzer": {
          "standard_dictionary_decompound": {
            "tokenizer": "standard",
            "filter": [ "dictionary_decompound" ]
        "filter": {
          "dictionary_decompound": {
            "type": "dictionary_decompounder",
            "word_list_path": "decompound_words.txt",
            "max_subword_size": 22

the .txt file has the following words, one on each line

Didn't work. the text has "clean up". When I query "clean up" I get a hit. I get nothing when I query "cleanup".

Do I need to filter using synonyms ?

synonyms would be an alternative here indeed.

Maybe you can explain why you went with the decompounder in the first case to avoid any confusion.


cleanup and database could be considered as closed compound words. just like basketball. i also wanted to know what ES considers compound words in English.

i suppose i could use (cleanup, clean) and (database,data) as synonyms, though it seems hacky.

using cleanup and clean as synonyms didn't work.
I get a hit when I use clean but none when I use cleanup.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.