OLAP analytics in Elasticsearch

I am working with Analytic of events, I use hadoop to process the logs and
store some results in Mysql. This did not work now due to scalability
issues as logs are keep coming daily.

We need to show stats per year, month, week, day, hour along with filtering
capability
Our samples can grow for 100k users, each uses 20 websites each hour
100,000(users) * 20 (unique website) * 24 (hours) = 48,000,000 (48 Million
max records per day)

Our table looks like
event_src_id, Time, User, Website, location, some stats

Some queries example are

  1. select website, sum(stats), count(distinct(user_id)) from table group
    by website;
  2. select website, sum(stats), count(distinct(user_id)) from table group
    by website where user_id=1;
  3. select website, sum(stats), count(distinct(user_id)) from table group
    by website where time > 1 jan 2014 and time <=31 jan 2014;

I tried Hadoop elastic search and its seems like insertion part can fixed
with that, I am more worried on the reading part.
The aggregation framework seems to give some hope but I could not work as
per query one. how to group and sum and distinct at same time?
How can I best use Elasticsearch with Hadoop with given scalability and
performance for OLAP based quires.
Any help will be appreciated.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/23d92880-9f23-49ce-b28c-ce99762b0d51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Any help to start of or its not possible at all?

On Tuesday, September 16, 2014 10:38:29 PM UTC+5, Maaz wrote:

I am working with Analytic of events, I use hadoop to process the logs and
store some results in Mysql. This did not work now due to scalability
issues as logs are keep coming daily.

We need to show stats per year, month, week, day, hour along with
filtering capability
Our samples can grow for 100k users, each uses 20 websites each hour
100,000(users) * 20 (unique website) * 24 (hours) = 48,000,000 (48
Million max records per day)

Our table looks like
event_src_id, Time, User, Website, location, some stats

Some queries example are

  1. select website, sum(stats), count(distinct(user_id)) from table group
    by website;
  2. select website, sum(stats), count(distinct(user_id)) from table group
    by website where user_id=1;
  3. select website, sum(stats), count(distinct(user_id)) from table group
    by website where time > 1 jan 2014 and time <=31 jan 2014;

I tried Hadoop elastic search and its seems like insertion part can fixed
with that, I am more worried on the reading part.
The aggregation framework seems to give some hope but I could not work as
per query one. how to group and sum and distinct at same time?
How can I best use Elasticsearch with Hadoop with given scalability and
performance for OLAP based quires.
Any help will be appreciated.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8a895cc-d377-49b7-8239-ec8c7b4082c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.