Problem looping through array in each doc with Painless

Mapping:

mappings": {
    "alert": {
        "properties": {    
            "history": {
                "properties": {
                    "category": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                        }
                    }
                    },
                    "description": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                        }
                    }
                    },
                    "timestamp": {
                    "type": "date"
                    },
                    "type": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                        }
                    }
                    },
                    "user": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                        }
                    }
                    }
                }
            }
        }
    }
}

Example _source:

"_source": {
    "history": [
        {
        "timestamp": "2017-06-23T10:44:31.974Z",
        "user": "dave",
        "type": "read",
        "category": "true"
        },
        {
        "timestamp": "2017-06-23T10:44:33.951Z",
        "user": "dave",
        "type": "status",
        "category": "closed",
        "description": "Closure reason: weather"
        }
    ],
}

My query:

{
    "query": {
        "bool": {
            "must": {
                "script": {
                    "script": {
                        "inline": "int isClosed = 0; long latestTimestamp = 0; for(int i=0; i<doc['history.type.keyword'].values.length; i++) { if(doc['history.type.keyword'][i] == 'status'){ if(doc['history.timestamp'][i] > latestTimestamp){latestTimestamp = doc['history.timestamp'][i]; if(doc['history.category.keyword'][i] == 'closed') { isClosed = 1} else { isClosed = 0} }}} return isClosed;",
                        "lang": "painless"
                    }
                }
            }
        }
    }
}

In short, my query is trying to filter in (or out) the documents who's history array shows that the most recent entry of type 'status' was of category type 'closed' (the status can open and close many times over the entity's lifetime).

Nothing I do seems to make this script operate as expected. I've stripped it down section by section. Is there anything I'm missing?

My gut says this might be related to this thread on date parsing: 5.0 - Comparing dates in painless however in my stripped down tests, ES seems perfectly capable of correctly comparing a long against doc['history.timestamp'][i].

Any help appreciated.

I'm not at a machine with elastic in front of me at the moment, but I think your mapping may be a cause. The history type probably needs to be nested. (I've not tried scripted queries on non-nested 'children' like this, but know that normal filters don't behave as you might expect; presumably the same applies for scripts)

https://www.elastic.co/guide/en/elasticsearch/reference/5.4/nested.html

Update: found the root problem and now have a working script!

The core problem was that the loop in the script is over doc['history.type.keyword'].values.length and, critically, the values are pre-filtered by Lucene for uniqueness. The length is therefore not equal to the history array length, it's the length of the number of unique keywords found.

So if your history contains 20 open and closed entries only, then doc['history.type.keyword'].values.length equals 2, not 20: there are only 2 unique keyword values.

Instead of using doc[foo] we needed to access the raw document not it's indexed version. Every reference of doc['foo'] needs replacing with params._source.foo.

Final working painless script:

int isClosed = 0; 
long latestTimestamp = 0; 

for(int i=0; i<params._source.history.length; i++) { 
    if(params._source.feedback[i].type == 'status'){ 
        if(OffsetDateTime.parse(params._source.history[i].timestamp).toInstant().toEpochMilli() > latestTimestamp){
            latestTimestamp = OffsetDateTime.parse(params._source.history[i].timestamp).toInstant().toEpochMilli(); 
            if(params._source.history[i].category == 'closed') { 
                isClosed = 1
            } else { 
                isClosed = 0
            } 
        }
    }
} 

return isClosed;

OffsetDateTime.parse().toInstant().toEpochMilli() is necessary because doc['history.timestamp'] returns a value already cleverly indexed and parsed to a date by ES. Accessing the same value through params._source.whatever does not and needs manually parsing.

2 Likes

Thanks for the suggestion Paul. These documents are already great-grandchildren and we were loath to extend the family further. There were some other reasons too that aren't worth going into. We've got a solution for now though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.