Best way to do compound bool query with exact matching strings on analyzed fields?

I'm sure this has come up before but I can't seem to find a definitive answer.

I effectively want to do this type of query in ES:

SELECT * FROM my_doc WHERE account_id = 1 AND name = "first doc"

Here's what my index/mappings look like:

curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
  "settings" : {
      "number_of_shards" : 1
  },
  "mappings" : {
    "my_doc" : {
      "dynamic" : "false",
      "include_in_all" : false,
      "date_detection" : false,
      "properties" : {
        "account_id" : {
          "type" : "string",
          "index" : "not_analyzed",
          "norms" : {
            "enabled" : true
          },
          "include_in_all" : true
        },
        "name" : {
          "type" : "string",
          "fields" : {
            "cjk" : {
              "type" : "string",
              "analyzer" : "cjk"
            },
            "english" : {
              "type" : "string",
              "analyzer" : "english"
            }
          },
          "include_in_all" : true
        }
      }
    }
  }
}'

And some data to search against:

curl -XPOST 'localhost:9200/_bulk?pretty' -d'
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "1" } }
{ "account_id" : "1", "name": "first doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "2" } }
{ "account_id" : "1", "name": "second doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "3" } }
{ "account_id" : "1", "name": "third doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "4" } }
{ "account_id" : "2", "name": "first doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "5" } }
{ "account_id" : "2", "name": "second doc" }

This ES query gets me the answer I want. Wondering if this is the best approach for finding the exact value of a string against an analyzed field or is there a better way.

curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
  "query": {
    "bool": { 
      "must": [{ "term": { "account_id": 1 } }, { "match": { "name": { "query": "\"first doc\"", "operator" : "and" } } }]
    }
  }
}'

If you want exact matches on "name" (like the SQL query does) you can add another subfield to your mapping for "name" that leaves it not analyzed. In 5.x you can do this e.g. with:

"fields" : {
            ...
            "raw" : {
              "type": "keyword"
            }
          }

I think in earlier versions you would need {"type": "string", "index": "not_analyzed"}. Now you can use another term query on the name field:

GET /my_index/_search?pretty
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "account_id": 1
          }
        },
        {
          "term": {
            "name.raw": "first doc"
          }
        }
      ]
    }
  }
}

Note that if you don't need scores you can put the two term queries in the filter section.

1 Like

Thanks for pointing me in the right direction, Christoph! Here's how I got the example working on 5.x:

curl -XPUT 'localhost:9200/my_index?pretty' -d'
{
  "mappings": {
    "my_doc": {
      "properties": {
        "account_id" : {
          "type": "keyword",
          "include_in_all" : true
        },
        "name": {
          "type": "text",
          "fields": {
            "cjk" : {
              "type" : "string",
              "analyzer" : "cjk"
            },
            "raw": { 
              "type":  "keyword"
            }
          },
          "include_in_all" : true
        }
      }
    }
  }
}
'

curl -XPOST 'localhost:9200/_bulk?pretty' -d'
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "1" } }
{ "account_id" : "1", "name": "first doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "2" } }
{ "account_id" : "1", "name": "second doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "3" } }
{ "account_id" : "1", "name": "third doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "4" } }
{ "account_id" : "2", "name": "first doc" }
{ "index" : { "_index" : "my_index", "_type" : "my_doc", "_id" : "5" } }
{ "account_id" : "2", "name": "second doc" }


curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "account_id": "1"
          }
        },
        {
          "term": {
            "name.raw": "first doc"
          }
        }
      ]
    }
  }
}
'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.