Group Element through query to use in Vega visualisation

Hi,
I'm currently learning about query and stuffs.
In order to do so I aim to have a vega visualisation that would allow me to visualise a per process memory usage.
So far, i managed to do it using both Lens and TSVB. Unfortunately they did not met all my requierements (or at least i didn't manange to find a way to build the visualisation i wanted)

This lead to my aim to create a Vega visualisation.
I managed to get data I need but unfortunately i didn't manage to group the as i wanted.
My question is how can i group my data.
Thanks to this query :

"query": {
    "exists": {
        "field": "process.pid"
    }
},
"fields": [
    "@timestamp",
    "process.pid",
    "system.process.memory.rss.bytes",
    "process.name",
    "user.name"
],
"_source": false

I get this kind of data :

{
  "took": 45,
  "timed_out": false,
  "_shards": {
    "total": 13,
    "successful": 13,
    "skipped": 11,
    "failed": 0
  },
  "hits": {
    "total": 128594,
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-metricbeat-cl01ptocor00-dev-elastic_stack-2021.04.14-000001",
        "_type": "_doc",
        "_id": "_aUX0XgBDu3u7xL3cpWD",
        "_score": 1,
        "fields": {
          "system.process.memory.rss.bytes": [
            568008704
          ],
          "process.name": [
            "java"
          ],
          "@timestamp": [
            "2021-04-14T15:54:37.043Z"
          ],
          "user.name": [
            "elasticsearch"
          ],
          "process.pid": [
            26131
          ]
        }
      }, 
      // Lots other hits
    ]
  }
}

my target would be to have something like :

{
  "took": 45,
  "timed_out": false,
  "_shards": {
    "total": 13,
    "successful": 13,
    "skipped": 11,
    "failed": 0
  },
  "hits": {
    "total": 128594,
    "max_score": 1,
    "hits": [
      {
        "process.pid" : 26131,
        "process.name": "java"
        "user.name": "elasicsearch",
        "memory_usage": [
            {
                "@timestamp" : "2021-04-14T15:54:37.043Z",
                "bytes": 568008704
            },
            // metrics on other timestamps
        ]
      }
      // other process
    ]
  }
}

I really do not understand how i am supposed to do so and some help would be appreciate.
Thanks by advance

A little confused but I will take a stab at it. I think you want to do something like this instead.

"query": {
    "exists": {
        "field": "process.pid"
    }
},
"_source": [
    "@timestamp",
    "process.pid",
    "system.process.memory.rss.bytes",
    "process.name",
    "user.name"
]

Hi @Dzious I see you are progressing...

Curious could you describe the visualization you are looking for?

Did you see this ... this is a 2 level Tree Map. If you use a KQL filter on user.name at the top it would be 3 level :slight_smile:

If you are looking for a 3 Level Tree Map say

User / Process Name / PID average with Process Memory. I already built one in Vega

It is made to be a 3 level Tree Map with a Value .. .Like Disk Space / CPU etc.

Perhaps take a look it is at this here ... bvader is me...

Use at your own risk :slight_smile: It has been a long time......

And By the way here is a line chart Split By

Username, Process Name, PID with Process memory values.

This is the line chart under


Don't let me talk you out of Vega, Vegas is fun but there's definitely a learning curve.

1 Like

Hi, Thanks for you reply.
After my post i kept trying new things and the solution you gave me was a step in my learning experience. Unfortunately this was still not the data structure i wanted.
With this solution i still have my memory usage / timestamp in differents hits.
My aim is to group them into one hit per process which contains all my memory usage/timestamp

But thanks for trying to help me :smiley:

Hi @stephenb
Yeah I am progressing :slight_smile:

The visualisation i aim to it the exact same as the one you show on your other reply :smiley:

I don't quite understand which ... you're refering to could try tell me a bit more ? :sweat_smile:
I will have a look at it thanks. Even if i do not use it i'll learn things that can only help me :smiley:

Even if i leave Vega for the moment i still have a lot of visualisation planned to do and i think that i will need it for some of them. I do not abandon Vega but i need to go ahead onto other type of coniguration for the moment.

Thanks for the help you brougth me along this first learning experience

1 Like

sorry just the way I type

this is a sample, I wrote with a lot of help, it has been a long time since I looked at it, and I don't use Vega that often so I am not the best resource and it could have bugs in it.

I am bvader on github some people are confused by that.

If you work in vega I would use the vega editor it is very helpful it can be found here

Good luck on your journey...

No problem mate :smiley:
yeah i had a look at it yesterday before leaving and this seems bit too complicated on the first sight x)
I'll eventually go back to it later on when i'll be building more visualisations ^^
I already use vega, this helped me a lot during my first hours of experiencing. Thanks for the tip anyway ^^
Thanks ! But don't fell free right away, I think we'll see each other soon :stuck_out_tongue:
have a great day :slight_smile:

1 Like

Well I eventually have a question for you Stephen,
How did you managed to get your values in Mib, I do not manage to find it. :sweat_smile:
Thanks by advance

Values in MiB in which solution / visualization?

On this screenshot :


On the left hand side memory usage is set as Mib

On what seems to be the same visualisation i have this


As you can see, my memory usage is set as byte and not Mib nor Gib

So assuming you ran metricbeat setup correctly with metricbeat that should happen automatically so I have a small concern there but you can check and fix.

Are your sure you ran metricbeat setup correctly also I am a little confused as some places seems like you are using metricbeat and others seem like you are using the elasticagent.

Example I am not sure where you got this index.... it just may be my lack of understanding...

Anyways .... Go To

Stack Management / Index Patterns / metricbeat-*

Mine already had the format set... and the should as part of the module

If not edit it and set it... It should be set for you as part of the module...

That worked thanks :smiley:

Also i'm using Metricbeat and I installed it through the rpm package from elastic.co download page. I'll double check it is installed correctly but i think so.
About the index, this come from the fact that i am using Data Stream to make the setup easier.
Anyway that not the topic :wink: Once again thanks for your help

Even though you install through rpm you still need run setup after you update your configs. You only need to run it once total no matter 1 or 1000 hosts, just run it first.

metricbeat setup -e

meyr

Alright thanks.
Unforutnately I've got an error due to Elasticsearch output not enable (i use Logstash)
This topic is not about metricbeat installation. Shall i make a new topic for this ?

To run setup just temporarily point filebeat output at elasticsearch.

Then when setup's over point filebeat output back to logstash that's actually the process.

And as I said that only needs to happen once so once you've done that you can just leave the elasticsearch output portion commented out.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.