How to access _id from Painless in Query context?

I would like to sanity-check indexed data.

Is it possible to access the _id field in a Painless script condition, in query context?

That is, something like (ES 5.3.x):

{
  "query": {
     "bool": {
        "must": [
           { "term": { ... other conditions ... } },
           {
              "script" : {
                 "script" : {
                    "inline": "doc['_id'].value.length()>10",
                    "lang": "painless"
                 }
              }
           }
        ]
     }
  }
}

This yields

    "type": "illegal_argument_exception",
    "reason": "Fielddata is not supported on field [_id] of type [_id]"

Hey,

I am not sure this works in ES 5.x without mapping changes, but it does with ES 6.4

GET foo/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": "doc['_id'][0].length() > 1"
          }
        }
      ]
    }
  }
}

Still, I think the better approach here would be to use an ingest processor and store the length of the field on indexing (I only tested this on 6.x as well)

POST _ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
        "script" : {
          "source" : "ctx.len = ctx._id.length()"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "doc",
      "_id": "1",
      "_source": { "foo": "bar" }
    },
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "second",
      "_source": { "foo": "rab" }
    }
  ]
}

# returns
{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_type": "doc",
        "_id": "1",
        "_source": {
          "len": 1,
          "foo": "bar"
        },
        "_ingest": {
          "timestamp": "2018-11-06T09:12:12.310133Z"
        }
      }
    },
    {
      "doc": {
        "_index": "index",
        "_type": "_doc",
        "_id": "second",
        "_source": {
          "len": 6,
          "foo": "rab"
        },
        "_ingest": {
          "timestamp": "2018-11-06T09:12:12.310166Z"
        }
      }
    }
  ]
}

The advantage of this would be, that you will have really fast queries, as you do not need to invoke a script for each hit.

Hope this helps!

--Alex

Hello Alexander, thanks for checking this, and your detailed reply!

Unfortunately, for ES 5.3.x the doc['_id'] bit already produces the error "Fielddata is not supported on field [_id] of type [_id]", no matter what's written after this. (One more reason to update...)

The idea with storing the length right away is nice. Alas, for checking multiple million documents one would have to "query-by-update" or "_reindex", which both require running a script again once per document, even if it's very simple.

We'll solve the issue by looking at the upstream data from which the _id is generated.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.