Weird Search behavior for a beginner


(Sasi ) #1

Hi Team,
Just starting out with project on ElasticSearch. Just landed on a solution which actually
worked for a problem but I am frustrated how that it actually works ??!!! Would like to
understand what's going on behind the scene for indexing the data.

Consider the test data

DELETE vinsearch

PUT vinsearch
{
  "mappings": {
    "vins" : {
      "properties": {
        "vin": {
          "type": "text"
        }
      }
    }
  }
}

PUT vinsearch/vins/1
{
  "vin" : "4JGDA5HB8HA89799"
}

PUT vinsearch/vins/2
{
  "vin" : "4JGDA5HB8HA89222"
}

PUT vinsearch/vins/3
{
  "vin" : "4JGDA5HB8HA89333"
}

Now , When I search on vin with following query, it does not work

POST vinsearch/_search
{
  "from" : 0,
  "size" : 10,
  "query" : {
    "bool" : {
      "must" : [
        {
          "match" : {
            "vin" : {
              "query" : "4jgda5hb8ha89799,4JGDA5HB8HA89222,4JGDA5HB8HA89333"
            }
          }
        }
      ],
      "boost" : 1.0
    }
  }
} 

But somehow if i would add an empty ,, between vins (See below) , the search result would come back...
WHY IT IS THAT ??? Can someone enlighten me please ?

POST vinsearch/_search
{
  "from" : 0,
  "size" : 10,
  "query" : {
    "bool" : {
      "must" : [
        {
          "match" : {
            "vin" : {
              "query" : "4jgda5hb8ha89799,,4JGDA5HB8HA89222,,4JGDA5HB8HA89333"
            }
          }
        }
      ],
      "boost" : 1.0
    }
  }
}

(David Pilato) #2

Try analyze API to understand what's happening behind the scene. That should help a lot.


(Sasi ) #3

Thanks David.
I have tried this already but not sure if there is anything useful to figure out.
For example for the _analyse API request,

GET _analyze
{
  "explain": true, 
  "analyzer": "standard", 
  "text" : "4JGDA5HB8HA89799"
}

I see the below response. Do you see anything useful to the problem i described?

{
  "detail": {
    "custom_analyzer": false,
    "analyzer": {
      "name": "standard",
      "tokens": [
        {
          "token": "4jgda5hb8ha89799",
          "start_offset": 0,
          "end_offset": 16,
          "type": "<ALPHANUM>",
          "position": 0,
          "bytes": "[34 6a 67 64 61 35 68 62 38 68 61 38 39 37 39 39]",
          "positionLength": 1,
          "termFrequency": 1
        }
      ]
    }
  }
} 

Why would searching for multiple vins in query string will not work without
a dummy colon (,,) ?
"query" : "4JGDA5HB8HA89222,4JGDA5HB8HA89333" -> Fails
"query" : "4JGDA5HB8HA89222,,4JGDA5HB8HA89333" -> Works


(Sasi ) #4

@dadoonet any thoughts??


(David Pilato) #5

Here is how your query is transformed:

GET _analyze
{
  "analyzer": "standard", 
  "text" : "4JGDA5HB8HA89222,4JGDA5HB8HA89333"
}

Gives:

{
  "tokens": [
    {
      "token": "4jgda5hb8ha89222,4jgda5hb8ha89333",
      "start_offset": 0,
      "end_offset": 33,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Which obviously does not match 4jgda5hb8ha89222 or 4jgda5hb8ha89333.

And:

GET _analyze
{
  "analyzer": "standard", 
  "text" : "4JGDA5HB8HA89222,,4JGDA5HB8HA89333"
}

gives:

{
  "tokens": [
    {
      "token": "4jgda5hb8ha89222",
      "start_offset": 0,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "4jgda5hb8ha89333",
      "start_offset": 18,
      "end_offset": 34,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Which obviously matches might match on 4jgda5hb8ha89333 or 4jgda5hb8ha89222.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.