Does search really work in ES?


(Gosforth) #1

I test different queries and the results are terrible.
I run query:
GET /printers/_search
{
"query": {
"bool": {
"must": {
"match": {
"product_description": {
"query": "LaserJet Pro M275",
"minimum_should_match": "95%"
}
}
}
}
}
}

Query returns "HP TopShot LaserJet Pro M275" but also ""HP LaserJet Pro P1606"... no similarity at all!. The best comes when query is "query": "Laser Jet Pro M275" - result: "HP Laser Jet Pro M501"... totally different string.
I tested also:
POST /printers/_search
{
"query": {
"match_phrase": {
"product_description": {
"query": "LaserJet Pro M275",
"slop": type whatever here
}
}
}
}
No positive results also. Tried also another type of query. Same poor result. The results resemble a cube roll. The only positive results I have is from single words. ES is completely lost when it comes to the phrases.
Yes, I have one shard, one replica, one whatever...


(Christian Dahlqvist) #2

Looking at the documentation it states that percentage based minimum_should_match clauses are rounded down. If you as in the example provide 3 terms to match, that is 2.85 which is the rounded down to 2. It does seem you matches contain both LaserJet and Pro, which sounds like the expected outcome to me.


(Gosforth) #3

The results are absolutely out of scope. Searching for "Laser Jet Pro M275" and receiving "HP Laser Jet Pro M501". If I lower "minimum_should_match" to 50% I will get "LaserJet Pro M275" (and still Laser Jet Pro M501" ) on the result list but with lower ranking (max_score higher better, right?). Anyway there should be no "Laser Jet Pro M501" result on the list - this string does not match the query.


(Christian Dahlqvist) #4

If you require all parts of the string you are searching for to match, which seems to be the case based on your description, set minimum_should_match to 100%. If you set it lower, not all terms in the string you supply need to match, which gives the results you are seeing. Please have a look at the documentation I linked to for a more in detail description about how the minimum_should_match parameter behaves.


(Gosforth) #5

No. That's not the solution. What it would be the search engine if the user had to type exactly what he search for? :slight_smile: Effective search should find similar phrases. Here's the example ES does not handle it correctly. I just can search for single words but is completely lost to find similar phrases.


(xeraa) #6
  1. It sounds like you actually want a phrase search? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
  2. Be sure that you are using the right tokenizer. M275 might be tokenized differently than how you are expecting ā€” I'd check it with the _analyze endpoint against your mapping

(Gosforth) #7

Thanks but tried that as well. It does not give expected results. You know, ES just does not search. Only effective to find single words.


(Mark Harwood) #8

It doesn't help that with your particular examples the same product is described as both laser jet and laserjet.
Elasticsearch doesn't automatically assume that these words should be combined for the same reasons it doesn't automatically assume the words use less should be thought of as useless.

Custom synoyms should be used to educate elasticsearch (or any other search engine for that matter) about these sorts of cases


(Gosforth) #9

I absolutely do not agree with you. Laser Jet and LaserJet are very similar expressions. The distance is very short. Now if I search phrase 'I want to find something' ES will display 'find one day something', 'something to find', 'I want ES to search correctly'... This is not searching. It just perform Select * from X where p LIKE '%%'. The order and the distance matters.
Anyway, I build my own function that works way better. ES is a great product but searching does not work there. Only it correctly finds only single words.


(Mark Harwood) #10

Demonstrably untrue. A "phrase" search requires that you use a "phrase" type of query expression.

DELETE test
POST test/doc
{ "text":"I want to find something"	}
POST test/doc
{ "text":"find one day something"	}
POST test/doc
{ "text":"something to find" }
POST test/_search
{
  "query": {
	"match_phrase": {
	  "text": "I want to find something"
	}
  }
}
//returns
"hits" : [
  {
    "_source" : {
      "text" : "I want to find something"
    }
  }
]	

Shingles and synonyms are ways to index your content such that clients don't need to use phrase queries to match what you consider to be phrases but if you've already independently developed your own matching functions that work at scale, more power to you.


(Gosforth) #11

Thanks for example yet in first post I clearly demonstrated that ES is not able to find similar phrases (even if they are returned, max_score is wrong). Google can be some inspiration how it should work.


(Mark Harwood) #12

Iā€™d look up an article on how Google leverages trillions of historical user searches not just software to deliver its experience but I suspect that would be wasted effort.
Iā€™m done here.


(Mark Walkom) #13

I think you're being extremely uncooperative here despite everyone pointing out options and how Elasticsearch does work.

I am going to lock this thread, if you'd like to ask the question again please reconsider your approach.


(Mark Walkom) #14