I have a requirement in which I need to find distinct company names. I was
using "Keyword" tokenizer for that field and through term facet I was able
to get distinct company names. However terms facet treated company names
like "ibm suisse", "ibm corporation", "ibm" as different companies.
Online documentation suggested me to use "Synonym filter" to solve this. My
settings is:
If I run a terms facet:
{
"facets": {
"loc_facet": {
"terms": {
"field": "company"
}
}
}
}
I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1}
{term: corporation, count: 1}
I want the facet result to return only one term: ibm corp ltd with count=3.
This way i will get distinct company names and also map synonym names into
single company name.
Please correct me if I am using wrong tokenizer or my approach is not
correct.
Your approach is wrong.
When you use synonym filter , it indexes all synonyms of that token hence
and synonym will match against that term.
So when you do a facet , you will get an aggregation of all synonyms rather
than just one.
Better approach would be to store the unique name into some other field and
take a facet of that field.
I have a requirement in which I need to find distinct company names. I was
using "Keyword" tokenizer for that field and through term facet I was able
to get distinct company names. However terms facet treated company names
like "ibm suisse", "ibm corporation", "ibm" as different companies.
Online documentation suggested me to use "Synonym filter" to solve this.
My settings is:
If I run a terms facet:
{
"facets": {
"loc_facet": {
"terms": {
"field": "company"
}
}
}
}
I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1}
{term: corporation, count: 1}
I want the facet result to return only one term: ibm corp ltd with
count=3. This way i will get distinct company names and also map synonym
names into single company name.
Please correct me if I am using wrong tokenizer or my approach is not
correct.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.