Using ES as an alternative for a data warehouse

Parikshit_Samant · July 23, 2012, 3:11am

Hi,

I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.

The following features of ES are useful: Schemaless storage, Fault
tolerance, scalability via distribution, JSON API, CRUD operations.
ACID properties, transactions are not very crucial to us. Wherever
applicable we could handle it at the application level. So, that is not a
problem too.
Even joins and other set based SQL operations like existence,
intersections can be managed somehow.

However, was not sure about the following, which was certainly crucial:

Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates, but
probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.

Radu_Gheorghe1 · July 23, 2012, 5:52am

Hi,

Yes, well if facets are not enough for what you need, I think you should
look at something with map/reduce functionality, and put Elasticsearch on
top of it and use it for what it offers (search, facets).

Best regards,
Radu

On Monday, July 23, 2012 6:11:22 AM UTC+3, Parikshit Samant wrote:

Hi,

I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.

The following features of ES are useful: Schemaless storage, Fault
tolerance, scalability via distribution, JSON API, CRUD operations.

ACID properties, transactions are not very crucial to us. Wherever
applicable we could handle it at the application level. So, that is not a
problem too.

Even joins and other set based SQL operations like existence,
intersections can be managed somehow.

However, was not sure about the following, which was certainly crucial:

Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates, but
probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.

Clinton_Gormley · July 23, 2012, 9:25am

Hiya

Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates,
but probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.

Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements

clint

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.

Parikshit_Samant · July 24, 2012, 2:30pm

Thanks for the help.

I guess we will need to continue using facets for aggregates like counts.
Might need to move over to something else (hadoop etc.) for other stuff.
E.g. for things like: "select department, account, sum(revenue) from
accountDetail group by department,account having sum(revenue) > 40000

Regards,
--Parikshit N. Samant.

On Monday, 23 July 2012 14:55:24 UTC+5:30, Clinton Gormley wrote:

Hiya

Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates,
but probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.

Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements

clint

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.

Topic		Replies	Views
Using ES as our primary and only datastore Elasticsearch	8	560	July 6, 2017
Data warehouse Elasticsearch es-hadoop	1	1509	June 28, 2018
What MongoDB can do and ES cannot? Elasticsearch	10	1777	July 6, 2017
Creating a new Data organization, what is the best solution? Elasticsearch	2	325	July 6, 2017
Finding the right balance between SQL and ES Elasticsearch	3	287	July 6, 2017

Using ES as an alternative for a data warehouse

Related topics