Exact match now works but why is my not_analyzed field split into tokens?


(Simon Taylor) #1

Continuing the discussion from Exact match not working:

printf "\nDelete the index to start from scratch\n";
curl -XDELETE 'http://192.168.134.179:9200/testnames2'
printf "\nCreate the index with a single type, and one field which is not_analyzed\n";
curl -XPOST http://192.168.134.179:9200/testnames2 -d '{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "hrname" : {
            "_all" : { "enabled" : false },
            "properties" : {
                "rawLookup" : { "type" : "string", "index" : "not_analyzed" }
            }
        }
    }
}'
printf "\nCheck mappings have persisted\n";
curl -XGET 'http://192.168.134.179:9200/testnames2/_mappings/?pretty=true'

printf "\nAdd one record into the index of the right type\n";
curl -XPUT 'http://192.168.134.179:9200/testnames2/hrname/1' -d '{
    "rawLookup" : "Simon Taylor"
}'
printf "\nRefresh index\n";
curl -XPOST 'http://192.168.134.179:9200/testnames2/_refresh'

printf "\nAttempt to get the field back using a filtered query using one term with the exact text i inputted\n";
curl -XGET 'http://192.168.134.179:9200/testnames2/_search/?pretty=true' -d '
{
      "query" : {
         "filtered" : {
            "filter" : {
               "term" : {
                  "rawLookup" : "Simon Taylor"
               }
            }
         }
      }
}
'
printf "Now just using Query Term RawLookup\n";
curl -XGET 'http://192.168.134.179:9200/testnames2/_search/?pretty=true' -d '
{
      "query" : {
               "term" : {
                  "rawLookup" : "Simon Taylor"
                }
        }
}
'
printf "Output of analyze"
curl -XGET http://192.168.134.179:9200/testnames2/_analyze?pretty=1,field=rawLookup -d'
Simon Taylor'

Generates this:-

Delete the index to start from scratch
{"acknowledged":true}
Create the index with a single type, and one field which is not_analyzed
{"acknowledged":true}
Check mappings have persisted
{
  "testnames2" : {
    "mappings" : {
      "hrname" : {
        "_all" : {
          "enabled" : false
        },
        "properties" : {
          "rawLookup" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

Add one record into the index of the right type
{"_index":"testnames2","_type":"hrname","_id":"1","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}
Refresh index
{"_shards":{"total":2,"successful":1,"failed":0}}
Attempt to get the field back using a filtered query using one term with the exact text i inputted
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "testnames2",
      "_type" : "hrname",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{
    "rawLookup" : "Simon Taylor"
}
    } ]
  }
}
Now just using Query Term RawLookup
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "testnames2",
      "_type" : "hrname",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source":{
    "rawLookup" : "Simon Taylor"
}
    } ]
  }
}
Output of analyze{
  "tokens" : [ {
    "token" : "simon",
    "start_offset" : 1,
    "end_offset" : 6,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "taylor",
    "start_offset" : 7,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}
Simons-MacBook-Pro:elastic simon$ 



(Simon Taylor) #2

What should i expect the output of analyze to look like for a not_analyzed field?


(David Pilato) #3

I guess it should be something like:

$ curl "localhost:9200/_analyze?pretty&analyzer=keyword" -d "Simon Taylor"
{
  "tokens" : [ {
    "token" : "Simon Taylor",
    "start_offset" : 0,
    "end_offset" : 12,
    "type" : "word",
    "position" : 0
  } ]
}

(Simon Taylor) #4

Thanks,
I note your version pays no reference to the index or the field I am interested in looking at.
Is the way to interpret what you have done is - take some text and run it through an analyzer and see how it tokenizes the text.
This may bear no relation to how a specific mapping for a specific index has tokenized data that is already there.

Is there a way to test how something has already been analyzed or not analyzed (as the case may be)?

I was hoping that something like the following would do it:-

printf "Output of analyze"
curl -XGET "http://192.168.134.179:9200/testnames2/_analyze?pretty=1&analyzer=keyword&field=rawLookup" -d "Simon Taylor"

(David Pilato) #5

Can you try with a json request? May be fields is not supported as a parameter?

See https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html


(Simon Taylor) #6

i found this syntax which does the job:-

printf "Output of analyze"
curl -XGET "http://192.168.134.179:9200/testnames2/_analyze?pretty=1" -d '
{
    "field" : "rawLookup",
    "text"  : "Simon Taylor"
}
'

(Vincent Tran) #7

That is working for the wrong reason. You are not actually testing whether the text "Simon Taylor" in your rawLookup field has been analyzed. Technically, the _analyze method here is using the analyzer used to analyze the field rawLookup of this index to analyze any text you're inputting into the parameter (which you happened to pick to be "Simon Taylor" in this case. You can put "I love Elasticsearch" for all it cares).

Overall, I think the test still does an approximation of what you're trying to determine: whether the field rawLookup on index testname2 is "not_analyzed" (we know the field is "not_analyzed" because its mapping is using a keyword analyzer to analyze the text you provide it in the parameter of the request).

Cheers.


(Simon Taylor) #8

ok makes sense.


(system) #9