Interactive "cleaning" of search input before actual search

Hi all,

first off I apologize if this question has already been asked or if it
doesn't belong here, I spent the last 2 days searching on google, reading
the elasticsearch documentation and generally experimenting with various
queries/filters/analyzers to no avail. Any pointer or even appropriate
search terms would be welcome answers.

I am trying to build a "smart" search for a machine parts database which is
organized along 4 axis : the item, the quality specifications, the machine
it can be used on, the vendor it can be found at.

A slightly simplified "part" looks like

{
"name":"conveyor belt",
"ref": "A3FD45GH56S-24FRG4",
"id": 1,
"quality":{ "range":"A", "value": 19},
"properties":{ "length":180,"width": 69,"ratio": 14},
"machine":{"maker":"Kity","product": "Bestcombi 2000","variation": "Bestcombi 2000 +"},
"vendor":{"name": "ADS","city":"york"}
},

A larger sample is available in gist
https://gist.github.com/jeantil/612fb395e2bcbcf07e26

A classic user input will almost never be good enough to find a proper
match in the database since we need information on all the axis to be able
to reduce the number of results to a manageable level.

Therefore we don't want actual parts results for a user input, instead we
want to identify in the user's query the terms which are most probably
specific enough of restrict one of the axis.

For instance if the user searches for (more sample inputs including typos
and expected results are available in the gist
https://gist.github.com/jeantil/612fb395e2bcbcf07e26 )
"conveyor belt for bestcomby 2000 in york"

I would like to know that it matches the following information in the parts
database so I can ask him for more information in the correct axis to
further restrict the :

"machine":{"product":"Bestcombi","variation":"Bestcombi 2000"}
"vendor":{ "city":"york" }

In case the product/variation combination is restrictive enough that it can
only match one maker (in the above example) it would be nice te know it
too, so a nice to have result with the provided example would be :

"machine":{"maker":"Kity","product":"Bestcombi","variation":"Bestcombi 2000"}
"vendor":{ "name":"ADS","city":"york" }

I have tried term suggestor, phrase suggestor, multimatch query with
aggregations but all of these fall short in one way or another :

  • suggestors don't ignore terms which don't belong to the "axis"
  • multimatch query will not always select the "correct" result (it returns
    Bestcombi 2000+ when searching Bestcombi) and the associated aggregations
    will rank Bestcombi 2000+ above Bestcombi since in the complete database I
    have much more parts for the bestcombi200+ than for the bestcombi.

Any pointer on how to tackle this problem would be very very welcome.

Thanks
Jean

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50a48b86-5b11-4988-994a-b9a45401fd81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.