Elasticsearch: Understanding Match query

rahulnama · December 12, 2018, 5:27am

Hi Team

Can you please tell the difference between below two queries?

GET /_search
{
"query": {
"match" : {
"message" : "this is a test"
}
}
}
GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "this is a test",
"operator" : "and"
}
}
}
}

I've indexed few pdf files and when I use the second query, I'm getting more relevant results.

Can someone explain the difference?

dadoonet · December 12, 2018, 5:55am

The first one is equivalent to

GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "this is a test",
"operator" : "or"
}
}
}
}

rahulnama · December 12, 2018, 7:02am

Hi @dadoonet

got it.

any difference if we dont mention operator in 2.

"message: : "this is test"

2."message:
"query":
"this is a test"

because when I use both 1 and 2 in match query, the results are varying lot.

Thanks
Rahul

dadoonet · December 12, 2018, 7:23am

No I don't think it makes any difference.

rahulnama · December 12, 2018, 8:21am

hi @dadoonet

facing a typical issue.

I have deployed my elasticsearch on Windows and Linux(same set of documents in both nodes, but both nodes are independent ) with same settings and mappings.

But when I search with a query, the results in windows and results in linux are completely different.

Any idea on this behavior?

Thanks
rahul

dadoonet · December 12, 2018, 8:34am

No. You need to share both results from both systems.

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.

rahulnama · December 12, 2018, 8:49am

Okay @dadoonet. will follow the instructions.

Issue: indexed 14 pdf files with same settings and mappings on 2 es nodes. but getting different results when queried.

Case-1: Elasticsearch deployed on Amazon EC2(Windows)

Indexed 14 pdf files

query:

indexname: testindex

{   "_source" : "url",
    "query": {
        "match" : {
            "content" : {
                "query" : "windows install"
                , "operator": "and"
            }
        }
    }
}

Response:

the last term in url is the name of the file

"hits": [
      {
        "_index": "testindex",
        "_type": "_doc",
        "_id": "5",
        "_score": 2.230532,
        "_source": {
          "url": "http://127.0.0.1:5000/js/Linux/linux _faq_3_manual.pdf"
        }
      },
      {
        "_index": "testindex",
        "_type": "_doc",
        "_id": "8",
        "_score": 2.084747,
        "_source": {
          "url": "http://127.0.0.1:5000/js/Linux/the-linux-faq.pdf"
        }
      }
]

Case-2: Elasticsearch deployed on Redhat Linux

Indexed same 14 pdf files

Index name: testindex

query:

{   "_source" : "url",
    "query": {
        "match" : {
            "content" : {
                "query" : "windows install"
                , "operator": "and"
            }
        }
    }
}

results:

"hits": [
            {  
                "_index": "testindex",
                "_type": "_doc",
                "_id": "11",
                "_score": 2.6487362,
                "_source": {
                    "url": "http://filesystemwef.com/Windows_Issues/31831392.pdf"
                }
            },
            {
                "_index": "testindex",
                "_type": "_doc",
                "_id": "12",
                "_score": 1.2416239,
                "_source": {
                    "url": "http://http://filesystemwef.com/Windows_Issues/357786482.pdf"
                }
            }
]

dadoonet · December 12, 2018, 9:52am

We can see that the computed _score is different. By default, elasticsearch sorts by _score so the ordering seems correct here.

Sadly I don't have the full response object just I'm just guessing here.
May be you have more than one shard and the distribution of your documents is different in one case than the other. Also the total number of documents is may be different in one system than the other.

Some ideas:

Run the same test with only one shard
Or use DFS: ?search_type=dfs_query_then_fetch
Check that you have exactly the same documents in both systems

rahulnama · December 12, 2018, 2:24pm

@dadoonet

you are right. both indices have 5 shards.

Any api to understand how many documents are in each shard?

using query_then_fetch is giving the same results in both nodes. Also, the results are more relevant. But is it recommended in production ?

Thanks for the query_then_fetch. I haven't seen this before. All the elastic concepts literally makes sense. Elasticsearch is offering lot of fleixibility. the more you understand it, the more you use the features of it, the more relevant your search is.

still lot and lot to know. thanks to all the elastic team for such sensible features.

-Rahul

dadoonet · December 12, 2018, 3:45pm

You can but if you don't have so many data, it's always better to use one single shard.

rahulnama · December 14, 2018, 2:01pm

hi @dadoonet

using one shard is giving more relevant results. Thank you for that.

If possible, Can you also suggest any solution to the below problem?

All the indexed documents are related to only windows and linux issues. Now whenever a user searches about "mobile issues", elasticsearch will return the results as it matches with issues, and it might also match with mobile somewhere in the documents.

Reference:

search query:

GET pdfminerone/_search
{ 
  "size": 10, 
  "_source": "url", 
  "query": {
    "match": {
      "content":   "mobile issues"
     
    }
  }
}

Response:

"hits": [
      {
        "_index": "pdfminerone",
        "_type": "_doc",
        "_id": "12",
        "_score": 4.144372,
        "_source": {
          "url": "http://127.0.0.1:5000/js/Windows_Issues/357786482.pdf"
        }
      },
      {
        "_index": "pdfminerone",
        "_type": "_doc",
        "_id": "10",
        "_score": 2.7226787,
        "_source": {
          "url": "http://127.0.0.1:5000/js/Linux/linux _faq_2_manual.pdf"
        }
      }]

The first document with score 4 is related to windows issues and nothing to say about mobile issues.

But, if we recommend that url to the user, user will waste his time searching about mobile issues in that url.

How to avoid such scenarios?

dadoonet · December 14, 2018, 2:25pm

By default elasticsearch does a "or" but you can change it to be a "and" with something like

GET /_search
{
    "query": {
        "match" : {
            "Field" : {
                "query" : "text",
                "operator" : "and"
            }
        }
    }
}

rahulnama · December 14, 2018, 3:04pm

makes sense but still I see similar results

Query-1

GET testbooks/_search
{ 
  "size": 10, 
  "_source": "url", 
  "query": {
    "match": {
      "content":   "mobile issues"
     
    }
  }
}

response:

"hits": {
    "total": 9,
    "max_score": 4.144372,
    "hits": [
      {
        "_index": "testbooks",
        "_type": "_doc",
        "_id": "3",
        "_score": 4.144372,
        "_source": {
          "url": "/Windows_Issues/357786482.pdf"
        }
      },
      {
        "_index": "testbooks",
        "_type": "_doc",
        "_id": "8",
        "_score": 2.7226787,
        "_source": {
          "url": "/Linux/linux _faq_2_manual.pdf"
        }
      }]

Query-2:

GET testbooks/_search
{    "_source": "url", 
    "query": {
        "match" : {
            "content" : {
                "query" : "mobile issues",
                "operator" : "and"
            }
        }
    }
}

Response:

"hits": {
    "total": 2,
    "max_score": 4.144372,
    "hits": [
      {
        "_index": "testbooks",
        "_type": "_doc",
        "_id": "3",
        "_score": 4.144372,
        "_source": {
          "url": "/Windows_Issues/357786482.pdf"
        }
      },
      {
        "_index": "testbooks",
        "_type": "_doc",
        "_id": "8",
        "_score": 2.7226787,
        "_source": {
          "url": "/Linux/linux _faq_2_manual.pdf"
        }
      }
    ]

dadoonet · December 14, 2018, 5:27pm

There is no content field in your example so I don't see how this works.

rahulnama · December 17, 2018, 5:19am

hi @dadoonet

Yea I agree. will both queries return the same score if both keywords(mobile , issues ) appeared in the documents even once?

-Rahul

rahulnama · December 19, 2018, 11:45am

hi @dadoonet.

I've indexed 14 books out of which two-three books talk about internet

search query:

GET testbooks/_search
{    "_source": "url", 
      "explain": true, 
    "query": {
        "match" : {
            "content" : {
                "query" : "unable to connect to the internet ",  
                "operator" : "and"
            }
        }
    }
}

when I run this query, I got a document which is not relevant to internet. though documents which are more relevant to internet are available in ES.

In the document ES returned, the keyword unable is repeated 60 times, the word connected is repeated 100 times but the internet is repeated only 2 times.

Still it got first in results: How to avoid such scenarios?

Please suggest

Note: I could post the results but it has lot of text so I didn't.

system · January 16, 2019, 11:45am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What's the difference between 'and bool match' and 'phrase match'? Elasticsearch	1	428	July 5, 2017
Semantically equivalent search leads to different results Elasticsearch	5	367	August 6, 2020
Why match query and common term query acts different? Elasticsearch	1	438	November 30, 2018
Match_all vs : Elasticsearch	6	720	July 6, 2017
Need Some Help Understanding Match Query Behavior Elasticsearch	3	220	February 8, 2023

Elasticsearch: Understanding Match query

Related topics