Aggregate with DevTool!

Beuhlet_Reseau · September 7, 2017, 9:51am

Hello,

I search to put aggregat on my data.

Here a party of my data :

TYPE	       Usage
FREEUSER	   345
PREMIUM_USER   8653
FREEUSER	   1369
FREEUSER	   87654
PREMIUM_USER   43678
FREEUSER	   8654
PREMIUM_USER   2387
FREEUSER	   98723
FREEUSER	   45873
PREMIUM_USER   2847
PREMIUM_USER   89235
USER_UNKNOW	   16235
USER_GOLD	   32457

My aim is :

To sum Usage by type of client, example :

For no gold user, i want know the sum of usage of other user on the same graph ! :

So, USER_PREMIUM & FREEUSER, USER_UNKNOW have use a total of 405 653 octets today.

Currentl, in kibana when i test this, i have the sum of usage by type of user so I have 3 curves on the same graph. But i want just One with cumulutavie sum of 3 specific type of client.

I thought this :

{
“query” : {
“constant_score” : {
“filter” : {
“match” : { “TYPE” : “PREMIUMUSER or FREEUSER or UNKNOW_USER” }
}
}
},
“aggs” : {
“sum_no_gold_user” : { “sum” : { “field” : “USAGE” } }
}
}

What do you think friends ?

Stacey_Gammon · September 7, 2017, 2:20pm

I'm not exactly sure what type of graph you are looking for, but you should be able to do this with a scripted painless field.

My example is using a bytes value per ip address, but yours would be a usage value per type.

Once you have the scripted field you can use it in visualizations to compare the total sum vs the sum per no_gold_user.

In my example, the lines match up except for two timestamps, this is because the ip I selected only has values for those two times:

Hope this helps.

Beuhlet_Reseau · September 7, 2017, 2:45pm

@Stacey_Gammon If i take 2 type of client and i make sum, here the graph ... :

But I only want one, depending on the type of users I choose.

I don't understand your post, what is better use method that i post previously :

{
“query” : {
“constant_score” : {
“filter” : {
“match” : { “TYPE” : “PREMIUMUSER or FREEUSER or UNKNOW_USER” }
}
}
},
“aggs” : {
“sum_no_gold_user” : { “sum” : { “field” : “USAGE” } }
}
}

or you painless scripted field ? your solution seems to be a bit deprecated no ?

Else, if i take your mistery solution this would give :

if (doc['TYPE'].value = 'PREMIUMUSER' , '23953', '962', 'FREEUSER', 'UNKNOW_USER' {
return doc['USAGE'].value
}
return 0

Stacey_Gammon · September 7, 2017, 3:26pm

Painless fields are not deprecated though they do come with some performance issues.

If you use painless fields, yours would look like this:

if (doc['TYPE'].value != 'USER_GOLD') {
 return doc['USAGE'].value
}
return 0

It's less to write to do a single "not equals" then OR'ing all the types you do want, though you can do it that way too:

if (doc['TYPE'].value == 'PREMIUM_USER' || doc['TYPE'].value == 'FREEUSER' || doc['TYPE'].value == 'USER_UNKNOWN' ) {
 return doc['USAGE'].value
}
return 0

As for your proposed method - where are you putting that JSON?

But I only want one, depending on the type of users I choose.

What do you want only one of? One bar on that graph instead of two? If you are simply looking for a single number, the total sum over your whole time range, you might want to look into a metric visualization.

If you want to do this dynamically (e.g. easily change the sum of the types you are looking for) you can just use a filter. Perhaps that is where you were putting your JSON above. Your filter JSON could look either like this:

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "ip": "1.104.179.62"
          }
        },
        {
          "match_phrase": {
            "ip": "0.137.97.198"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

for the "or" version, or for the "not" version:

{
  "query": {
    "match": {
      "ip": {
        "query": "10.92.69.153",
        "type": "phrase"
      }
    }
  }
}

Beuhlet_Reseau · September 7, 2017, 4:05pm

Oh excuse moi ... i want this type of graph ;

Filter by type of user, here is the normal behavior :

But if I want to focus on a particular type of few users, how can I do it?

=> Thank you for

With your experience, what is the best method? Painless or dynamically with DevTool

Other things, I have a 120 000 000 messages by day. I want make a graph on a month, but impossible because there are 3 480 000 000 messages and i think that kibana is not robust enough (some timeout). [it's just a parenthesis]

Stacey_Gammon · September 7, 2017, 5:06pm

Is the filter method shown above not sufficient?

With your experience, what is the best method? Painless or dynamically with DevTool

Can you explain a bit more what you mean by dynamically with devtools? I don't see how devtools would help you when creating a visualization.

I want make a graph on a month,

You can create month intervals if that is what you are looking for. The calculations are done in Elasticsearch, so Kibana won't actually be handling 3 billion messages, it will just get back the aggregates from elasticsearch. If Elasticsearch can't handle the query (you can test this via dev tools), there are ways to improve performance. If that is the case, I encourage you to ask in the Elasticsearch room to get some more details on how to improve your setup if it's hanging.

Beuhlet_Reseau · September 8, 2017, 10:16am

I understand @Stacey_Gammon with devtool, i can obtain the result but not a field which can be used to create graph...

I wanted to say, filter JSON or painless field better ?

As a remember, i have disable _all field in mapping of elasticsearch and message field in Logstash.

So to create a js filter i can make that (but i can't exploit a field with ot no ?):

{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "TYPE": "PREMIUM_USER"
          }
        },
        {
          "match_phrase": {
            "TYPE": "UNKNOW_USER"
          }
        },
        {
          "match_phrase": {
            "TYPE": "FREEUSER"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }

If FREEUSER is a party of TYPE, i can use " *FREEUSER * " no ?

But it will just filter and in the end I would always have 3 curves on my graph no ?

To have the total sum on a single curve i must use painless scripted field no ?

Beuhlet_Reseau · September 11, 2017, 12:22pm

if (doc['TYPE'].value == 'PREMIUM_USER' || doc['TYPE'].value == 'FREEUSER' || doc['TYPE'].value == 'USER_UNKNOWN' ) {
return doc['USAGE'].value
}
return 0

It's doesn't work.

system · October 9, 2017, 12:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.