First ElasticSearch project (config and queries inside) - Feedback?

Hello everyone,

we recently started our first project with ElasticSearch to replace our
current search function. The experience so far has been great and we were
able to build a small prototype within just a few days (also thanks to this
group which provided tremendously helpful information).

Right now we have a working config and two queries which already provide
pretty stunning results for our production data set (just about 100.000
entities):
config => https://gist.github.com/4079257
query for companies => https://gist.github.com/4079267
query for verticals => https://gist.github.com/4079269

Some words about the actual data:

  • we have about 200 hundred industry sectors (we refer to those as 'master'
    in the files above), each sector is defined as a parent for the other two
    entities
  • then we have companies that are assigned to those industry sectors. Each
    company has a name, a list of categories and a full text description
  • additionally we have so called verticals for each industry sector. These
    are basically information sites with articles. Those articles are not shown
    in the results, but help us to provide more content that allows us to match
    search terms to those industry sectors. Each vertical text has a name, a
    title, description and a bunch of keywords

To provide a real world example:

  • https://dl.dropbox.com/u/84641/Screenshots/result.png
  • this example shows the input of "Wespennest" (wasps nest)
  • we are now sending two queries to ES, one that queries for companies and
    one that queries for verticals
  • it correctly returns the master category "Schädlingsbekämpfung" (pest
    control) for it
  • also relevant pest control companies are shown

So basically everything looks fine :slight_smile: But here is the deal, since it is
difficult to find out about best practices with ES, we were wondering if we
did things in "the right way" or if there is anything unnecessarily complex
or even wrong about the config and queries. If anyone has a tip on how to
improve the results even more, we would be more than glad to hear them.

In the end, depending on your feedback, we hope that this provides a
further good/bad example for a slightly more complex mapping :slight_smile:

Thanks,
Chris

--

It's hard to recommend something without complete understanding of your use
case, but a couple of things that I noticed are 1) decompounder_de filter
doesn't seem to be used in any of the declared analyzers and 2) I would
suggest switching from index time boosting to query time boosting.

On Thursday, November 15, 2012 11:13:42 AM UTC-5, Christian wrote:

Hello everyone,

we recently started our first project with Elasticsearch to replace our
current search function. The experience so far has been great and we were
able to build a small prototype within just a few days (also thanks to this
group which provided tremendously helpful information).

Right now we have a working config and two queries which already provide
pretty stunning results for our production data set (just about 100.000
entities):
config => gist:4079257 · GitHub
query for companies => gist:4079267 · GitHub
query for verticals => gist:4079269 · GitHub

Some words about the actual data:

  • we have about 200 hundred industry sectors (we refer to those as
    'master' in the files above), each sector is defined as a parent for the
    other two entities
  • then we have companies that are assigned to those industry sectors. Each
    company has a name, a list of categories and a full text description
  • additionally we have so called verticals for each industry sector. These
    are basically information sites with articles. Those articles are not shown
    in the results, but help us to provide more content that allows us to match
    search terms to those industry sectors. Each vertical text has a name, a
    title, description and a bunch of keywords

To provide a real world example:

  • https://dl.dropbox.com/u/84641/Screenshots/result.png
  • this example shows the input of "Wespennest" (wasps nest)
  • we are now sending two queries to ES, one that queries for companies and
    one that queries for verticals
  • it correctly returns the master category "Schädlingsbekämpfung" (pest
    control) for it
  • also relevant pest control companies are shown

So basically everything looks fine :slight_smile: But here is the deal, since it is
difficult to find out about best practices with ES, we were wondering if we
did things in "the right way" or if there is anything unnecessarily complex
or even wrong about the config and queries. If anyone has a tip on how to
improve the results even more, we would be more than glad to hear them.

In the end, depending on your feedback, we hope that this provides a
further good/bad example for a slightly more complex mapping :slight_smile:

Thanks,
Chris

--