I'm new to ES ans trying to sort strings i always get an error if
string contains more than one word.
Another question is about dynamic deduplication/distinct/unique based
on a field, i've searched along the ES wiki and trying to search in
this group without success, does ES provides a "unique" feature or
something equivalent removing duplicates answers given a field ?
something like:
..?q=some+key+words&unique=reference
removing any duplicates from the resultset based on the "reference"
tag.
And the last one, my test data contains accents, i tried various
configurations of analysers, installed the icu plugin and set it into
a filter, set langage to french, but it seems accents are not removed
from tokenized items.
I'm actually using sphinxsearch and accents need to be manually table-
mapped into the configuration file, is there an quivalent into ES ?
I'm new to ES ans trying to sort strings i always get an error if
string contains more than one word.
You'll need to index them via keyword analyzer
Another question is about dynamic deduplication/distinct/unique based
on a field
issue 256 regarding group by feature is not yet implemented. You'll
need to do it on the client side.
And the last one, my test data contains accents, i tried various
configurations of analysers, installed the icu plugin and set it into
a filter, set langage to french, but it seems accents are not removed
from tokenized items.
Did you tried the custom rules of the icu plugin? I read an article
that it should be somehow possible ... I'll check
Thanks, i'll try your suggestions and give feedback,
For unicity (or grouping as named in sphinxsearch) it's because we
have products and videos duplicated in various sub catalogs /
categories.
We don't recombine similar entries as they have their own keywords,
target url and so on depending on the portal they belong to, and our
search is possible in a given portal or cross portal.
In cross universe search only one result is displayed sorted by
various factors (freshness, relevance, ...) others similar results are
throwed, today everythnig is done by the search engine.
Throwing data client side is complicated as we have to get many
results to build the navigation bar by removing duplicates and
counting, imagine we may have thousand results it'll be a pain to
paginate results, we're using ajax and we prefer search engine powered
pagination to limit data transfer and platform load.
Not sure if I completely followed your usecase but IMO one option in
your case would be to use only one product with an array for the urls
+categories and decide (via middle layer or client) which one to
display.
Throwing data client side is complicated as we have to get many
results to build the navigation bar by removing duplicates and
counting, imagine we may have thousand results it'll be a pain to
paginate results, we're using ajax and we prefer search engine powered
pagination to limit data transfer and platform load.
Ok, I more meant with 'client side' the middle layer (if there is
any).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.