Accessing doc value fields from painless in update

Hi,

I can do this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#_scripted_updates
which increments the previously stored counter field by 4.

But what if I want to exclude the counter field from the doc (and only store it in doc values to save massive amounts of space) and I would like to increase that similarly?

I couldn't make it work, because it seems elastic can fetch the actual version of the doc (and make it accessible to painless), but it can't do the same with doc values.

Is there a way to do this, or this needs to be implemented?

Can you please give a concrete example of a scripted update you would like to run, and the behavior you expect of that update?

Sure.
I have the following mapping (it's used just for demonstation):
{
"_source": {
"excludes": ["counter"]
},
'dynamic': 'strict',
'properties':
{
'counter': {'type':'long'},
}
}

And a script:
ctx._source.counter=doc['counter']+1;

When I run this script on an existing doc with an integer in its counter field, it fails with:
RequestError(400, u'illegal_argument_exception', {u'status': 400, u'error': {u'caused_by': {u'lang': u'painless', u'script': u'counter', u'script_stack': [u"ctx._source.counter=doc['counter']+1;\n", u' ^---- HERE'], u'caused_by': {u'reason': None, u'type': u'null_pointer_exception'}, u'reason': u'runtime error', u'type': u'script_exception'}, u'root_cause': [{u'reason': u'[wdxwsRv][127.0.0.1:9300][indices:data/write/update[s]]', u'type': u'remote_transport_exception'}], u'type': u'illegal_argument_exception', u'reason': u'failed to execute script'}})

I would expect that I can access the doc values in update scripts like I can access them in queries or I can access the source fields.
The rationale in this is that I don't want to waste storage1 for storing these data in _source, but I want to update the doc, and doc values could be used to transfer the previous values (either unmodified or modified, depending on the task).

doc is how doc values are accessed in scoring/search scripts. They are the columnar store used for fast lookup at search time. Update scripts (which I think is the type of script you are running) only have access to the _source. Their purpose is to update the source of a document before being indexed. If you want to have a counter, you'll need to leave the field in _source.

Additionally, it is highly discouraged to use source excludes/includes. Doing so prevents other features like reindex from working.

Even reindex could use this feature (personally, I don't need it), you could do a scripted reindex this way.
Storing data in _source is a waste (memory, CPU and storage wise) in many cases and it would be nice if elastic could handle this.

How hard would it be to implement this?

I'm not sure exactly how difficult it would be (but my hunch is, quite difficult). It would also actually be much less performant for reindex, because every document would require N seeks (where N is the number of fields) vs accessing _source requires just 1 seek to gather the entire document.

For me, requiring some extra seeks worths the 1/3 size of the on disk representation of the database... :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.