Synonyms with FSCrawler

Hi,
I like to use synonym_path to assign a couple of synonyms to optimize my search.
My enviroment is setup with Elasticsearch 8.15.1 and FSCrawler and after some study I figure out that I can't add mapping as done in common tuturials as there's already a mapper assigned (by FSCrawler I think).

So when I try

PUT /my_index/_mapping
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_synonym_analyzer"
    }
  }
}

I get an error

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_exception",
        "reason": "analyzer [my_synonym_analyzer] contains filters [my_synonym_filter] that are not allowed to run in index time mode."
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: analyzer [my_synonym_analyzer] contains filters [my_synonym_filter] that are not allowed to run in index time mode.",
    "caused_by": {
      "type": "mapper_exception",
      "reason": "analyzer [my_synonym_analyzer] contains filters [my_synonym_filter] that are not allowed to run in index time mode."
    }
  },
  "status": 400
}

My issue is now to understand how I can handle to use my own synonyms.
Tutorials are rare and/or outdated and I'm a novice with Elastic/FSCrawler.

So what do you recommend is the best and a simple way to use own synonyms with Elasticsearch 8.15.1 and FSCrawler.

Thank you,
Andre

You need to update the component templates.

Have a look at Elasticsearch settings — FSCrawler 2.10-SNAPSHOT documentation.

Note that you must reindex then.

Also make sure that you are using the latest 2.10-SNAPSHOT.

Thank you, I read this page but don't get it.

If you want to define your own index settings and mapping to set analyzers for example, you can update the needed component template before starting the FSCrawler .

What does before starting the FSCrawler means?
I have to delete the index and use before I first run of FSCrawler?
But in this case PUT _component_template/fscrawler_mapping_content_semantic won't work because of missing fscrawler settings, will it?

The setting "copy_to": "content_semantic" I've never meet before in any sample.
Is it requiered in any case?

Here are the steps (please note that this will remove any existing data).
If you don't have a trial or commercial license or running on cloud, then you don't have access to semantic search. I will assume that you don't do semantic search in the following instructions.

I suppose here that you have already ran FSCrawler at least once before and that the templates have been installed already. This is something you can check by running:

GET _component_template/fscrawler*

This should gives you a non empty list of components.

So, you want to add synonyms. The first thing to do is to add it to the existing "content" component template named fscrawler_mapping_content:

PUT _component_template/fscrawler_mapping_content
{
  "template": {
    "settings": {
      "analysis": {
        "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms_set": "my-synonym-set",
            "updateable": true
          }
        },
        "analyzer": {
          "my_synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_synonym_filter"
            ]
          }
        }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
      }
    }
  }
}

Then you need to tell FSCrawler not to push anymore its internal templates, otherwise your change will be overwritten.

elasticsearch:
  push_templates: false

Then you need to remove the existing data. Let's assume that you used the default settings and that the index name is fscrawler:

DELETE fscrawler

Then you can start again fscrawler with the --restart option.

And I think this should work well. I did not test this but let me know if you have any trouble.

Amazing, thank you very much.
I get that template is transmited from FSCrawler to ES and there'll be changed.

There's a last issue with the request

as it returns

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[2:3] [component_template] unknown field [mappings]"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[2:3] [component_template] unknown field [mappings]"
  },
  "status": 400
}

Without the "mappings" part it runs fine but I expect the analyser has to be mapped too the template?
Is something missing?

Ha sorry. It was wrong. Try this one.

PUT _component_template/fscrawler_mapping_content
{
  "template": {
    "settings": {
      "analysis": {
        "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms_set": "my-synonym-set",
            "updateable": true
          }
        },
        "analyzer": {
          "my_synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_synonym_filter"
            ]
          }
        }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
      }
    }
  }
}

Yes, that it, thank you.
It's really great to make it work after a long time and I learned about how it work in general.
Anyway I note that there's much more to study for a deeper understanding ...