Matching on 'raw' fields


(Peter Trei) #1

TL,DNR: I'm having trouble matching on complete fields. If I
don't have an exact match, I get partial matches on the analyzed field instead,
instead of nothing, the desired result.

Using '.raw' fieldnames doesn't help.

Long version:
I'm searching for matching records in an index I created.

I know that each entry is unique in one of its fields (a client name).
For each client, there is only one entry with each value.

Here's a cut down example, taken from the output of the elasticsearch-head plugin:

{
    "_index": "test_clients",
    "_type": "logs",
    "_id": "AU4nhEKF0LgRYwV_2tQB",
     "_version": 1,
    "_score": 1,
    "_source": {
         "doc": {
            "client": "e123456789.us.foo.com",
            "first_seen": "2015-06-20T13:42:38.000Z",
            "last_seen": "2015-06-20T22:16:59.000Z"
        }
     }
}

The client field is unique in that index.

I'm using the elasticsearcy-py module to retrieve messages in python,
using this code:


def get_msg(indexname,fieldname,fieldvalue):
  global es #this has been set up elsewhere

  query='{"size":1,"query":{"match":{"'+fieldname+'":"'+fieldvalue+'"}}}'
  resp = es.search(index=indexname,doc_type="logs",body=query)
  return resp

Which generates a query which looks like this:

{"size":1,
 "query":{
    "match":{
      'client':'e123456789.us.foo.com'
            }
         }
}

The code above works just fine if I have an exact match on the client
name, returning the matching record.

The problem arises when I don't have a match. In that case,
it uses the analyzed client name, and matches on anything with 'us.foo.com'
in the client name. I want an exact match, or nothing.

I can input 'doc.client' as the fieldname, along with an existing client, and get the same result.

BUT: If I input 'client.raw' or 'doc.client.raw' as the fieldname, and a real client name, I get

{
  "hits": {
    "hits": [],
    "total": 0,
    "max_score": null
  },
  "_shards": {
    "successful": 5,
    "failed": 0,
    "total": 5
  },
  "took": 1,
  "timed_out": false
}

What am I doing wrong? Why can't I match on * .raw fields?
It appears that I could add a min_score:1 to the query, but that
doesn't seem quite right.

thanks!


(system) #2