Group by and sum operation for fields


(Zekeriya Kaplan) #1

First of all I want to tell you that I am new at ElasticSearch

I have a JSON in following format. I want to group by "CallTo" field and sum of "Count" value for each "CallTo" I will use this values for visulation in Kibana

My first question is what my index should be for group by and sum operation that I mentioned? And my second question is what is the query?

My Json

{
  "Labels": "qwerty",
  "Type": "type1",
  "Id": "id12345",
  "FieldName": [
    {
      "CallTo": "Tom",
      "Count": 2
    },
    {
      "CallTo": "Jessica",
      "Count": 4
    },
    {
      "RegionCode": "US",
      "Count": 1
    },
    {
      "RegionCode": "DE",
      "Count": 5
    },
    {
      "CallCategory": "K1",
      "Count": 6
    }
  ],
  "OtherField": [
    {
      "Key": "bin5",
      "Value": 0
    },
    {
      "Key": "bin1",
      "Value": 0
    },
    {
      "Key": "bin3",
      "Value": 2
    },
    {
      "Key": "binOther",
      "Value": 0
    }
  ],
  "XField": [
    {
      "Key": "bin50000",
      "Value": 1
    },
    {
      "Key": "bin10000",
      "Value": 3
    },
    {
      "Key": "bin30000",
      "Value": 4
    },
    {
      "Key": "binOther",
      "Value": 7
    }
  ]
}

My expected result my be like this

{
  {
    "CallTo": "Tom",
    "Count": 23
  },
  {
    "CallTo": "Jessica",
    "Count": 44
  },
  {
    "RegionCode": "US",
    "Count": 18
  },
  {
    "RegionCode": "DE",
    "Count": 58
  },
  {
    "CallCategory": "K1",
    "Count": 46
  }
}

Also I am open for alternative solution even change Json format

Thanks


(David Pilato) #2

I wonder if you should instead index every single "call" instead of indexing a list of calls.
But for that it would be better to explain what the use case is about and what the documents represent IMO.


(Zekeriya Kaplan) #3

It is an another solution but I guess this approach may causes some performance problem or others(I can't be sure about that because I am new in Elasticsearch).

FieldName array size could be millions. If I index every single "call" It will cause anything?

Can you show your solution for input JSON please ?

Thanks


(David Pilato) #4

I guess this approach may causes some performance problem

I think the total opposite.


(Zekeriya Kaplan) #5

What is your input json format than?


(David Pilato) #6

i'm not sure because I don't know a lot about your data. But something like:

PUT calls/doc/1
{
  "Labels": "qwerty",
  "Type": "type1",
  "Id": "id12345",
  "CallTo": "Tom",
  "RegionCode": "US",
  "CallCategory": "K1"
}

(Zekeriya Kaplan) #7

First of all I am really appreciate for your answer, but your proposition is not proper for my case.
Let me explain deeply what is the meaning of my input json which I attach in my answer

I work on some voip call event to get some valuable data during a time interval. In specified time interval I start an aggregation policy to get JSON, I mean storing aggregation result JSON is more important than storing all call event.


(David Pilato) #8

In specified time interval I start an aggregation policy to get JSON, I mean storing aggregation result JSON is more important than storing all call event.

Why not doing the aggregation in real time in elasticsearch instead? Using whatever period you want...


(Zekeriya Kaplan) #9

Aggregation process is out of my control. I get JSON from ElasticSearch for group by and sum operations etc. My responsibility is on only output jsons(attached above). Therefore we should focus on it


(David Pilato) #10

I thought you said that you can control how data is produced:

Also I am open for alternative solution even change Json format

Can't you?


(Zekeriya Kaplan) #11

I use this approach and it works for me now =)
Thanks


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.