Low latency multi dimensional aggregation of varying dims number/order


(shadi) #1

Hello,

I would like to know if ES can fit our use case or not before investing into a POC. Basically, we need to execute aaggregated queries (sum/avg) over metrics in impressions/clicks logs given size is in TBs and a varying number/order of dimensions (up to 20) in addition a response time of less than 3 seconds.

I understand that this is a easy job for Vertica but i would like to know if ES can compete in this area as well. If yes, what is usually the typical cluster size?

Thank you


(shadi) #2

Does anybody have a clue?


(Alexander Reelsen) #3

Hey,

this question is very generic and does not contain too much concrete requirement in terms of how your data looks like, how your query look like, hardware, etc. I think it would be much easier to just setup Elasticsearch, index data, query it and see if you are happy with the response - or ask further once you have a more concrete prototype up and running and can provide better insights in what you are doing - or get commercial help from Elastic for this.. </sales> :slight_smile:

--Alex


(shadi) #4

Well, I took your advice and setup a single ES 5.x node on AWS with the following specs:
16 Core, 122GB RAM, and SSD (3000 IOPS)

I imported 600M+ documents into ES using logstash, and i executed an ES query that looks like:
SELECT Id, SUM(x), SUM(y), SUM(z) GROUP BY ID LIMIT 10

The query took ~28 seconds.

Is this something usual? or i am missing something here?

-SM


(system) #5