I saw that there is a default limit of 256 synonyms with 32 terms per set. However this could be increased.
I was wondering where the limits are and more importantly what kind of impact this will have on App Search's performance.
What would the maximum be that we can increase this to ?
What would the performance impact be for lets say 1.000 synonyms, or even 40.000 synonyms?
Is this doable?
There will definitely be a performance impact when increasing the number of synonyms in App Search.
However there is another way!
You could potentially look at Elasticsearch synonyms:
Recently Elasticsearch synonyms have become easier to manage since we now have an API to do so.
You could potentially create your synonym set in Elasticsearch and define your Elasticsearch index that uses the synonym set for text fields. Then using this index, you can create an Elasticsearch based engine in App Search.
I do have to ask out of curiosity, what is the use case to create so many synonym sets?
Yes I love that the Synonyms API has been added however our use case is that the client currently has alot of synonyms in their Elasticsearch. However they want to switch to App Search for its beautiful GUI and the ability to edit synonyms, currations and relevance tune through the GUI.
So unfortunately that wont work, unless I can use this with App Search?
say when using an elasticsearch index based engine? However not using the elasticsearch/_search endpoint, while still using the /search endpoint.
Would that be possible?
Otherwise what kind of performance impact are we talking about?
You can still define synonyms for this engine in App Search - however this are App Search synonyms - they have a different implementation from Elasticsearch synonyms.
Other features that you mentioned like curations or relevance tuning are also available.
However there are some limitations with Elasticsearch index Engines - the linked docs from above should include them - for example you cannot use the App Search Documents API.
In terms of the performance impact with synonyms - Elasticsearch synonyms would perform much better, because they work at a lower level (they are part of the index analyzers).
The way App Search synonyms work is that at query time we retrieve the synonyms set from an internal, system index and decide which synonyms apply to the query string. Then App Search includes the synonyms in the generated Elasticsearch query it uses to retrieve results.
The part that has a performance impact is retrieving the synonyms set - we haven't optimised that so that we only retrieve from Elasticsearch the minimum set of synonyms that can match the query string.
So the more synonyms sets there are in App Search, the bigger the impact at query time.
Hope this makes sense, even if goes into the implementation details of App Search.
Also just as a note, while I haven't tested this closely, I think we can expect that App Search and Elasticsearch synonyms to behave differently - because they are different implementations.
Based on this thread, we will make a note however on the need to have an Elasticsearch synonyms UI in Kibana.
Yes, the documentation lists all the differences between App Search managed engines and Elasticsearch index engines in Elasticsearch index engines | App Search documentation [8.12] | Elastic
As an example, for the Elasticsearch index engines it is not possible to use the App Search Documents API to index/modify documents. To add documents to an Elasticsearch index engine, you actually have to use the Elasticsearch documents API and index documents directly in the index.
So the more synonyms sets there are in App Search, the bigger the impact at query time.
Got it, do you have some "numbers" for this. how drastic the impact is?
Unfortunately I don't have any numbers, but I do know this comes up as one of the causes for performance issues in App Search. If you notice this too, let us know.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.