Aggregations large data in real time in Elasticsearch. What solution is the best?

Hi all,
I have a question ?

With a system have about a billion logs each days, How can i get all my aggregations data like "count request error per host", "count request per days, per hours, ..", in time series in real time, that mean when i request aggregations, the query can consume old results before to calculating new result for best performance, how we do that in Elasticsearch .. or exist any way better ?

Thanks all. Sorry about my bad english.

The basic answer is that Elasticsearch calculates all of that at request time.
It doesn't run these sorts of calculations on a scheduled and then store the results for any requests that are made.

So any request you make will be a real time one.

1 Like

Hi, Thanks for reply.
What concept of Elasticsearch that i use to implement it ?

It sounds like you might be interested in the roll up api. I believe there are videos and. Log posts about it, but do not have links handy at the moment.

1 Like

I think you need to look at aggregations. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

Hi, Thanks for suggestion, but i saw the search aggregations, that calculate again each time we searching, so performance seem bad for large data set, i need tracking realtime

so performance seem bad for large data set

And did you try it?

No, I read the document, that don't say about any mechanism for cache old results, when my data set is billion document each day, when multi user query for tracking request in realtime. I think it is a bad idea in this case.

The results need to be updated per seconds

I don't think it is.
I'd really recommend that you try before saying it's not going to work. You could be very surprised.

2 Likes

Yep. I'll try that. Thanks you

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.