Using ES as an alternative for a data warehouse


(Parikshit Samant) #1

Hi,

I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.

  • The following features of ES are useful: Schemaless storage, Fault
    tolerance, scalability via distribution, JSON API, CRUD operations.
  • ACID properties, transactions are not very crucial to us. Wherever
    applicable we could handle it at the application level. So, that is not a
    problem too.
  • Even joins and other set based SQL operations like existence,
    intersections can be managed somehow.

However, was not sure about the following, which was certainly crucial:

  • Group by + having + aggregates over large data sets: ES does have
    facets, can possibly can be used in case we want counts as aggregates, but
    probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.


(Radu Gheorghe) #2

Hi,

Yes, well if facets are not enough for what you need, I think you should
look at something with map/reduce functionality, and put Elasticsearch on
top of it and use it for what it offers (search, facets).

Best regards,
Radu

On Monday, July 23, 2012 6:11:22 AM UTC+3, Parikshit Samant wrote:

Hi,

I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.

  • The following features of ES are useful: Schemaless storage, Fault
    tolerance, scalability via distribution, JSON API, CRUD operations.
  • ACID properties, transactions are not very crucial to us. Wherever
    applicable we could handle it at the application level. So, that is not a
    problem too.
  • Even joins and other set based SQL operations like existence,
    intersections can be managed somehow.

However, was not sure about the following, which was certainly crucial:

  • Group by + having + aggregates over large data sets: ES does have
    facets, can possibly can be used in case we want counts as aggregates, but
    probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.


(Clinton Gormley) #3

Hiya

  • Group by + having + aggregates over large data sets: ES does have
    facets, can possibly can be used in case we want counts as aggregates,
    but probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.

Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements

clint

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.


(Parikshit Samant) #4

Thanks for the help.

I guess we will need to continue using facets for aggregates like counts.
Might need to move over to something else (hadoop etc.) for other stuff.
E.g. for things like: "select department, account, sum(revenue) from
accountDetail group by department,account having sum(revenue) > 40000

Regards,
--Parikshit N. Samant.

On Monday, 23 July 2012 14:55:24 UTC+5:30, Clinton Gormley wrote:

Hiya

  • Group by + having + aggregates over large data sets: ES does have
    facets, can possibly can be used in case we want counts as aggregates,
    but probably not optimal.

If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.

Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements

clint

Would appreciate any help on the above.

Regards,
--Parikshit N. Samant.


(system) #5