Does es can solve below requirement?

(Kramer Li) #1

Firstly I know ES can do it functionally. But the performance is not very good. So I want to know if I am doing wrong or ES just can not.

My requirements is like I have some data in below format

appID, clientID,clientIP,times

1, 2, ,, 20

I want to know something like
SELECT clientIP, sum(times) FROM table GROUP BY clientIP order by sum(times)

I want to get the top 3 ip which visit our web most.

ES can do this , but the performance is poor if there is too much unique IP. I use 16G memory, 8 core computer.
8g memory to ES and no swap. It took 1 minute to return result if there are 10 million data. It is unacceptable.

So ... Am I using a wrong way or ES not fit in this requirement? Should I go for spark?

(Mark Walkom) #2

Is there more to this? :slight_smile:

(Kramer Li) #3

yes... I clicked the post button by accident. Now it is finished :slight_smile:

(Mark Walkom) #4

That seems wrong, it should be able to do that very quickly.

What version are you on?

(Kramer Li) #5

ES2.2. I am using term aggregation since I want to aggregate on a field. And I use shard_size 0 to make it accurate

(system) #6