we recently started our first project with ElasticSearch to replace our
current search function. The experience so far has been great and we were
able to build a small prototype within just a few days (also thanks to this
group which provided tremendously helpful information).
we have about 200 hundred industry sectors (we refer to those as 'master'
in the files above), each sector is defined as a parent for the other two
entities
then we have companies that are assigned to those industry sectors. Each
company has a name, a list of categories and a full text description
additionally we have so called verticals for each industry sector. These
are basically information sites with articles. Those articles are not shown
in the results, but help us to provide more content that allows us to match
search terms to those industry sectors. Each vertical text has a name, a
title, description and a bunch of keywords
this example shows the input of "Wespennest" (wasps nest)
we are now sending two queries to ES, one that queries for companies and
one that queries for verticals
it correctly returns the master category "Schädlingsbekämpfung" (pest
control) for it
also relevant pest control companies are shown
So basically everything looks fine But here is the deal, since it is
difficult to find out about best practices with ES, we were wondering if we
did things in "the right way" or if there is anything unnecessarily complex
or even wrong about the config and queries. If anyone has a tip on how to
improve the results even more, we would be more than glad to hear them.
In the end, depending on your feedback, we hope that this provides a
further good/bad example for a slightly more complex mapping
It's hard to recommend something without complete understanding of your use
case, but a couple of things that I noticed are 1) decompounder_de filter
doesn't seem to be used in any of the declared analyzers and 2) I would
suggest switching from index time boosting to query time boosting.
On Thursday, November 15, 2012 11:13:42 AM UTC-5, Christian wrote:
Hello everyone,
we recently started our first project with Elasticsearch to replace our
current search function. The experience so far has been great and we were
able to build a small prototype within just a few days (also thanks to this
group which provided tremendously helpful information).
Right now we have a working config and two queries which already provide
pretty stunning results for our production data set (just about 100.000
entities):
config => gist:4079257 · GitHub
query for companies => gist:4079267 · GitHub
query for verticals => gist:4079269 · GitHub
Some words about the actual data:
we have about 200 hundred industry sectors (we refer to those as
'master' in the files above), each sector is defined as a parent for the
other two entities
then we have companies that are assigned to those industry sectors. Each
company has a name, a list of categories and a full text description
additionally we have so called verticals for each industry sector. These
are basically information sites with articles. Those articles are not shown
in the results, but help us to provide more content that allows us to match
search terms to those industry sectors. Each vertical text has a name, a
title, description and a bunch of keywords
this example shows the input of "Wespennest" (wasps nest)
we are now sending two queries to ES, one that queries for companies and
one that queries for verticals
it correctly returns the master category "Schädlingsbekämpfung" (pest
control) for it
also relevant pest control companies are shown
So basically everything looks fine But here is the deal, since it is
difficult to find out about best practices with ES, we were wondering if we
did things in "the right way" or if there is anything unnecessarily complex
or even wrong about the config and queries. If anyone has a tip on how to
improve the results even more, we would be more than glad to hear them.
In the end, depending on your feedback, we hope that this provides a
further good/bad example for a slightly more complex mapping
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.