Firing aggregation on field having String as its data type

I have a use case where I have to fire aggregation on a field having string as its data type. But that field contains numeric as well as alpha-numeric values.

{
            "_index": "website",
            "_type": "blog1",
            "_id": "2",
            "_score": 1,
            "_source": {
               "title": "Title",
               "text": "Just trying this out...",
               **"linenum": "20"**
            }
         },
         {
            "_index": "website",
            "_type": "blog1",
            "_id": "4",
            "_score": 1,
            "_source": {
               "title": "Title",
               "text": "Just trying this out...",
               **"linenum": "10892AC"**
            }
         },
         {
            "_index": "website",
            "_type": "blog1",
            "_id": "1",
            "_score": 1,
            "_source": {
               "title": "Title",
               "text": "Just trying this out...",
               **"linenum": "10"**
            }
         }

Is there any way where I can get sum of linenum field ignoring the alphanumeric value.
I have tried using script in the query as well.

{
    "query": {
        "match_all": {}
    },
    "fields": "*",
     "aggs" : {
        "linenum_Sum" : { 
            "sum" : { 
                "script" : "Float.parseFloat(doc['linenum'].value)" }
        }
    }
    
}

But even this query will give me only 10 in the result including script exception.

{
            "shard": 2,
            "index": "website",
            "node": "pHlqkKB5QMi8vy4TUY-Cog",
            "reason": {
               "type": "script_exception",
               "reason": "failed to run inline script [Float.parseFloat(doc['linenum'].value)] using lang [groovy]",
               "caused_by": {
                  "type": "number_format_exception",
                  "reason": "For input string: \"ac\""
               }
            }
         }

Your script needs to deal with the exception itself. If a script does not catch a thrown exception the aggregation will fail. You'll need to do something like the following code (disclaimer: this snippet is completely untested and I do not know a lot of groovy, I wrote this after doing a quick google search for groovy exception handling)

Also note that you should be careful with allowing inline groovy scripts to run on your cluster (see https://www.elastic.co/blog/scripting-security and https://www.elastic.co/guide/en/elasticsearch/reference/2.3/modules-scripting.html)

{
    "query": {
        "match_all": {}
    },
    "fields": "*",
     "aggs" : {
        "linenum_Sum" : { 
            "sum" : { 
                "script" : "try { return Float.parseFloat(doc['linenum'].value); } catch (NumberFormatException e) { return 0; }" }
        }
    }
    
}

The script should catch the number format exception and return 0 as the value for this document. Note that it does not look to see if the number format exception was thrown on an alphanumeric string or a malformed number string so if it encounters badly formatted numbers it will not fail the aggregation.

2 Likes

It worked. Many Thanx