Need help on "sum by group" aggregations on simple doc structure

ElasBon · November 8, 2020, 1:45pm

I'm using es 7.8.1, my data are pojo records like this:

    {
        "timestamp": "2020-11-07 08:00:00",
        "lineId": 532,
        "sn": "A2600101000005DF",
        "upBytes": 5927,
        "dlBytes": 13706
    }

I was trying to sum my records like the sql below:

    select sum(upBytes) as totalUpBytes, sum(dlBytes) as totalDlBytes, lineId from traffic
    where timestamp between 11111 and 2222
    group by sn

I'm using spring, here's the actual request that spring elasticsearch generates:

    {
        "size": 0,
        "query": {
            "bool": {
                "must": [
                    {
                        "range": {
                            "timestamp": {
                                "from": 1603260000,
                                "to": 1604838407,
                                "include_lower": true,
                                "include_upper": false,
                                "boost": 1
                            }
                        }
                    }
                ],
                "adjust_pure_negative": true,
                "boost": 1
            }
        },
        "sort": [
            {
                "_seq_no": {
                    "order": "desc"
                }
            }
        ],
        "aggregations": {
            "by_sn": {
                "terms": {
                    "field": "sn",
                    "size": 10,
                    "min_doc_count": 1,
                    "shard_min_doc_count": 0,
                    "show_term_doc_count_error": false,
                    "order": [
                        {
                            "_count": "desc"
                        },
                        {
                            "_key": "asc"
                        }
                    ]
                },
                "aggregations": {
                    "totalUpBytes": {
                        "sum": {
                            "field": "upBytes"
                        }
                    },
                    "totalDlBytes": {
                        "sum": {
                            "field": "dlBytes"
                        }
                    }
                }
            }
        }
    }

it gives me error:

Elasticsearch exception [type=illegal_argument_exception, reason=Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [sn] in order to load field data by uninverting the inverted index. Note that this can use significant memory.]

I checked the index mapping, the "sn" field was auto mapped as "text" instead of "keyword". (actually that index has no specific mapping during creation). so I tried changing "sn" to "sn.keyword", another exception came up:

class org.elasticsearch.search.aggregations.bucket.terms.ParsedStringTerms cannot be cast to class org.elasticsearch.search.aggregations.metrics.Sum (org.elasticsearch.search.aggregations.bucket.terms.ParsedStringTerms and org.elasticsearch.search.aggregations.metrics.Sum are in unnamed module of loader 'app')

which is more obscure, and I don't quite understand what happening...

any suggestion?
And, what's the better practice for such requirement?

=====================

here's the mapping:

    {
        "traffic": {
            "mappings": {
                "properties": {
                    "dlBytes": {
                        "type": "long"
                    },
                    "lineId": {
                        "type": "long"
                    },
                    "session": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "sn": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "timestamp": {
                        "type": "long"
                    },
                    "upBytes": {
                        "type": "long"
                    }
                }
            }
        }
    }

wangqinghuan · November 8, 2020, 4:24pm

This happened when there are two data type(string and long) on your sum field(upBytes or dlBytes). Did you search on multiple indices which contain same field name but diferent data types?

ElasBon · November 9, 2020, 3:51am

I dumped the records without aggregation, then checked them against index mapping. all records were correctly mapped.

so i guess maybe the 2nd exception is about the incorrect usage of embedded aggregations? i'm not quite sure about this part, i copied it from another thread here:

 "aggregations": {
            "by_sn": {
                "terms": {
                    "field": "sn",
                    "size": 10,
                    "min_doc_count": 1,
                    "shard_min_doc_count": 0,
                    "show_term_doc_count_error": false,
                    "order": [
                        {
                            "_count": "desc"
                        },
                        {
                            "_key": "asc"
                        }
                    ]
                },
                "aggregations": {
                    "totalUpBytes": {
                        "sum": {
                            "field": "upBytes"
                        }
                    },
                    "totalDlBytes": {
                        "sum": {
                            "field": "dlBytes"
                        }
                    }
                }
            }
        }

wangqinghuan · November 9, 2020, 1:32pm

No, this aggregation can work in my test.

ElasBon · November 9, 2020, 2:46pm

I really appreciate your verification, it confirmed my aggregation plan was basically correct.
So I debug my code line by line, finally I found the trap.
It wasn't the query or aggregation that causes the exception, but the incorrect response parsing.
I was populating the term aggregation result as a sum result. (What?!!)

Finally I made it work! Thanks again!

system · December 7, 2020, 2:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to aggregate and sum the data grouped by some fields Elasticsearch	1	419	February 12, 2020
Need help with Sum Elasticsearch	4	303	July 6, 2017
Aggregation on most recent document in a group Elasticsearch	5	900	May 22, 2018
Can I group by a field and ignore the buckets? Elasticsearch	1	376	July 4, 2018
Group by and sum operation for fields Elasticsearch	11	2199	March 13, 2018

Need help on "sum by group" aggregations on simple doc structure

Related topics