Performing analysis on synonym preprocessing

aclowkey · May 2, 2018, 6:23am

We have a bunch of SQL tables which defines the business's synonyms. And I'm working on migrating these synonyms to ElastiSearch.

Often these synonyms are not tokenized and expanded forms of the same terms might appear twice. i.e. "apple" is a synonym of "apples", or an example with stopwords, i.e. "Off The Hinge" is a synonym of "OTH".

Now when I try to load these synonyms as is, ElasticSearch has a lot of problems with the entries in the dictionary. In particular, ones that have stopwords, are position increment errors.

My first attempt at solving this was to send the terms to be analyzed by a live node via HTTP. It was way to slow as there were around 20k different terms and it took well over few hours.

My second attempt was to use NTLK, but it was also inconsistent with ElasticSearch's analysis.

I wanted to know if I can replicate this analysis to the synonyms before I'm storing them in the file. I'm happy to hear any suggestion regarding this topic. Thanks.

system · May 30, 2018, 6:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonym Problem Elasticsearch	9	2801	March 14, 2017
Using synonym_graph means non-synonyms are not found Elasticsearch	9	272	March 22, 2023
Dynamic query-time synonyms Elasticsearch	3	1671	July 5, 2017
Synonym search not working with abbreviations Elasticsearch	2	1098	January 8, 2020
Ask for suggestion for synonym design Elasticsearch	4	737	May 23, 2017

Performing analysis on synonym preprocessing

Related topics