In the table visualization I put in "user_name" as row, and count of Job_id as matrics, I'm expecting
User Name No. of Jobs
1550279 1
1666678 1
But the visualization is showing 2 for each user. How to correct the count?
The reason I put all the results into single document, is to overwrite the document every X mins. so that the visualization can monitor the running jobs in "realtime". If each result is one document, how to refresh the document every X mins?
I'm not sure. You can filter the dataset by submitted_ts to only look at the last x minutes? I don't know the exact use case so I can't comment further.
But it seems that you have timeseries data here, which is a good fit for time based indices and data streams.
But may be explain a bit more what you would like to do.
The data source is from Dremio sys_jobs table, this table only keeps the running jobs, once the job is completed, it will be removed from this table automatically by Dremio.
The dashboard built on ELK is to monitor the running jobs such as no. of running jobs, jobs by users etc.
For instance, job AAA is retrieved from the dataset the first time as one running job, the job count is 1. If this job is a long running job, the job count is 1 too for the second query. But if this job is completed before the second query, then the job count is 0.
I'm lost why two steps. The example given are two running jobs, one for each user. They are fetched at the same time from dremio sys_jobs, the total job count is 2, not 1.
on the ELK side, what jobs are running on Dremio is not known till logstash query the dataset sys_jobs via Dremio API.
That's why I use the single document in the index, and always refresh the document by X mins interval, to make sure Dremio dataset is mirrored in ELK index file. Is there any way Kibana can correctly display the metrics such as no. of running jobs owned by user?
To me Solving the Kibana issue looks much easier than creating an index with different content.
{
"dremiojob_result": {
"user_name": ["1666678","1550279"],
"query_type": ["FLIGHT","REST"],
// Other fields skipped for clearness
"status": ["METADATA_RETRIEVAL","RUNNING"]
}
}
So there's no way to know which user correspond to which status...
If you want to keep inner documents as separated documents, then you need to use nested type for the dremiojob_result field. Using nested documents makes the requests more complex as you need to use nested queries and things like that. Which makes also the usage in Kibana more complex IMO.
That's why, unless there's a need for it, I'd have separated documents first and try to see if with that way, you are able to achieve what you are looking for.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.