Recommendation for large synonym file

timpb · February 25, 2016, 4:31pm

Hi all,

I am looking into the synonym token filter and the Elasticsearch documentation recommends that when you work with large synonym datasets you should set the synonyms_path to a file over inserting synonyms directly into the configuration file.

But the documentation tells not why. Is it just because of maintainability? That you do not want scroll through thousands of synonyms to check your configuration/mapping?

And how does Elastcisearch handle this synonym file? Are the contents of the file loaded into memory after an configuration update?

Thanks in advance!

Regards,

Tim

dadoonet · February 25, 2016, 4:47pm

Probably because the cluster state which contains index settings will become too big?

frankkoornstra · February 26, 2016, 8:45am

I'm keen to know an answer to this as well. Provisioning synonyms through index settings is way easier (done through REST API) than provisioning through files.

@dadoonet: are you sure that all synonyms get sent along with cluster state? It doesn't seem logical to me since synonyms are part of the index settings, not the cluster state. Sending along all filters with cluster state seems weird.

dadoonet · February 26, 2016, 3:53pm

Index settings are part of the index metadata and index metadata is part of the cluster state.

frankkoornstra · February 26, 2016, 4:03pm

Alright, thanks for that! Putting them in a file seems better then

timpb · February 26, 2016, 4:08pm

Thanks for the clarification!

dadoonet · February 26, 2016, 4:31pm

I'm unsure if it's better. TBH I'd really love it to be loaded from a document stored into an elasticsearch index than from the file system. Because, it's harder to maintain on the FS and distribute on all nodes a consistent file.

I opened this feature request. Will see where it goes: https://github.com/elastic/elasticsearch/issues/16824

Ivan · February 26, 2016, 5:09pm

Years ago I wrote a collection of token filters that read its values from a
database. Been in production all this time. Always wanted to reboot that
project into a public release, but it was held up due to a change in the
way analyzers are created/stored in Elasticsearch. The change was pushed
into the 3.x (now 5.x) branch. Since 5.0 will be released soon (alpha at
least), I should revisit the project.

Ivan

Topic		Replies	Views
Synonym token filter Elasticsearch	5	583	July 6, 2017
Does synonyms.txt file in synonym token filter reside in cluster state? Elasticsearch	2	507	February 5, 2018
How does Elasticsearch persist large synonym file? Elasticsearch	1	580	June 1, 2017
Update on synonyms file requires full index recreation? Elasticsearch	3	527	July 6, 2017
Synonyms path vs inline array? Elasticsearch	2	1213	July 5, 2017

Recommendation for large synonym file

Related topics