Firstly I know ES can do it functionally. But the performance is not very good. So I want to know if I am doing wrong or ES just can not.
My requirements is like I have some data in below format
1, 2, ,220.127.116.11, 20
I want to know something like
SELECT clientIP, sum(times) FROM table GROUP BY clientIP order by sum(times)
I want to get the top 3 ip which visit our web most.
ES can do this , but the performance is poor if there is too much unique IP. I use 16G memory, 8 core computer.
8g memory to ES and no swap. It took 1 minute to return result if there are 10 million data. It is unacceptable.
So ... Am I using a wrong way or ES not fit in this requirement? Should I go for spark?