Elastic does not see the difference between phrases in search

Gosforth · July 28, 2018, 2:37pm

I search for "akg b 6200" and it returns record with "ghj b 7876", "nma b987"... and so on.
Searching engine works in this product?

dadoonet · July 28, 2018, 4:25pm

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Gosforth · July 28, 2018, 7:25pm

I create index (all in Kibana):

PUT products
{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}
}

I do mappings:

PUT products/_mapping/p
{
"properties": {
"product_name": {"type": "text"},
}
}

I insert several products:

POST /products/p/
{
"product_name" : "akg b 986"
}
...
Then I search:
GET /products/_search
{
"query": { "match": { "product_name": "whatever here" } }
}

Returns all documents...
Tried also 'multi_match', 'match_all'...

If it only finds one single letter... it returns document as a result of search. More, score is absolutely wrong - no logic in that.

dadoonet · July 28, 2018, 9:00pm

Can you tell what you put in "whatever here"?
And why do you think scoring is wrong?

Gosforth · July 29, 2018, 9:39am

I can type 'dresg b 1234' or 'jkod b9854' and elastic will return docs. It should not. The ONLY common thing is letter 'b'. Not enough to return as positive search. Score is completely wrong. For instance if the product_name is '"akg b 986" and you search for 'akg b986' and you search for 'akg b 986' the first doc 9'akg b986') can receive... better score

Looks like Elastic use primitive '%X%' search.

Christian_Dahlqvist · July 29, 2018, 10:00am

I would recommend using the analyze API to see exactly what Elasticsearch indexes behind the scenes. Also be aware that scoring will depend on the contents and distribution of data in the shard where the document is found. If you have more than one shard for an index, scoring for the same match may therefore differ depending on in which shard the document is located.

Gosforth · July 29, 2018, 10:09am

Thanks I but get nothing. I do not want to analyze. I just want to search. Should I create some special index where searching really works (by default it does not - examples above)? Or search query has to have some magical commands?
I think my example is really ABC - and all guys are happy with such 'search'?

dadoonet · July 29, 2018, 10:12am

With default analyzer b9854 is probably indexed as b and 9854.

That's why this match. Change the analyzer depending on your use case

Gosforth · July 29, 2018, 1:16pm

So by default Elastic search is not working?
Hard to say that search is doing job if it respond to any string query match...

I do not know what you are taking about. Any example?
When I created mapping:

{
"mappings": {
"products": {
"_all": {
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "keyword_analyzer"
},
"properties": {
"product_name": {
"type": "text",
"search_analyzer": "keyword_analyzer",
"analyzer": "edge_ngram_analyzer"
}
}
}
}

Result of search was also bad. Returned all document where only one single letter matched.

dadoonet · July 29, 2018, 5:19pm

It really depends on your use case... I mean... Really.

So you need to adapt the search engine for your needs.
If you don't, Elasticsearch is trying to guess what you want to do, but that's just a guess.

I mean: are 'jkod b9854 and dresg b 1234 real terms you want to index?
Do you want to search for exact match "a la SQL"?

If so, index your documents with this mapping:

{
  "properties": {
    "product_name": {"type": "keyword"}
  }
}

As @Christian_Dahlqvist says (and you should always follow Christian's advices ), the _analyze API is super useful to understand what is happening at index time and at search time.

Gosforth · July 29, 2018, 5:40pm

Thanks but we're spinning in circle. You can help me? If you only know the subject pls give me example of working solution. Christian's advice to analyze what software is doing is useless for me. Maybe one day where I get deeper. More, I do not understand why search result should depend the shard they are coming from - this is still same data set does not matter where logically or physically is. No of shards should not have any influence on scoring. This is still one database.

I not not want exact much. I expected that this product has some advanced searching engine (like they promote it 'it's just search'). What I see it finds... anything. Does not matter the accuracy.

"Are 'jkod b9854 and dresg b 1234 real terms you want to index?"

YES they are. You can imagine 'mkpl c890', 'hasf c 4367p'. And I need software that is able to distinguish that even if there is 'c' i the phrase this is not enough to return it as a search result' - if I search for 'mkpl c 890' do not return second doc - this is completely different phrase). By deafault this software returns all. It just does not search.

dadoonet · July 29, 2018, 6:19pm

If you don't like that the standard analyzer generates mkpl, c and 890 because it's your use case, you need to use another analyzer.

That's exactly where the _analyze API plays a great role again. To build the custom analyzer that will suit your needs.

You have an example on the page I linked to.

Note that one of the reason, the standard analyzer is splitting your text c980 as c and 980 is when you are indexing things like 180EUR for example. That should be split in 180 and EUR.

More, I do not understand why search result should depend the shard they are coming from.
No of shards should not have any influence on scoring. This is still one database.

It's because you don't understand how elasticsearch works. Each shard is a Lucene instance. You can think as a Lucene instance as one database if you wish. So when you have 5 shards, you have actually 5 databases.

You have then 2 options:

Use only one shard
Use dfs-query-then-fetch

system · August 26, 2018, 6:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
_score not as I'd expect Elasticsearch	3	597	December 1, 2017
ES gives very different scores, in match_phrase_prefix, for similar documents even I use DfsQueryThenFetch Elasticsearch	1	417	July 6, 2017
Why did the same text get different _score? Elasticsearch	1	520	April 8, 2017
Not getting perfect results (Am using elasticsearch dsl for django ) and please help me Elasticsearch	2	233	April 27, 2022
Different score for exact same keyword Elasticsearch	5	4084	July 6, 2017

Elastic does not see the difference between phrases in search

Related topics