Hi,
I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.
- The following features of ES are useful: Schemaless storage, Fault
tolerance, scalability via distribution, JSON API, CRUD operations.
- ACID properties, transactions are not very crucial to us. Wherever
applicable we could handle it at the application level. So, that is not a
problem too.
- Even joins and other set based SQL operations like existence,
intersections can be managed somehow.
However, was not sure about the following, which was certainly crucial:
-
Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates, but
probably not optimal.
If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.
Would appreciate any help on the above.
Regards,
--Parikshit N. Samant.
Hi,
Yes, well if facets are not enough for what you need, I think you should
look at something with map/reduce functionality, and put Elasticsearch on
top of it and use it for what it offers (search, facets).
Best regards,
Radu
On Monday, July 23, 2012 6:11:22 AM UTC+3, Parikshit Samant wrote:
Hi,
I was investigating over whether we could use elasticsearch as an
alternative for a data warehouse.
- The following features of ES are useful: Schemaless storage, Fault
tolerance, scalability via distribution, JSON API, CRUD operations.
- ACID properties, transactions are not very crucial to us. Wherever
applicable we could handle it at the application level. So, that is not a
problem too.
- Even joins and other set based SQL operations like existence,
intersections can be managed somehow.
However, was not sure about the following, which was certainly crucial:
- Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates, but
probably not optimal.
If the aggregation capabilities are going to be very unnatural/heavy for
ES, we might as well need to go with some other alternative (Mongo/Couch or
maybe MySQL, don't know), and use ES only for search, or just full-text
search.
Would appreciate any help on the above.
Regards,
--Parikshit N. Samant.
Hiya
- Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates,
but probably not optimal.
If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.
Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements
clint
Would appreciate any help on the above.
Regards,
--Parikshit N. Samant.
Thanks for the help.
I guess we will need to continue using facets for aggregates like counts.
Might need to move over to something else (hadoop etc.) for other stuff.
E.g. for things like: "select department, account, sum(revenue) from
accountDetail group by department,account having sum(revenue) > 40000
Regards,
--Parikshit N. Samant.
On Monday, 23 July 2012 14:55:24 UTC+5:30, Clinton Gormley wrote:
Hiya
- Group by + having + aggregates over large data sets: ES does have
facets, can possibly can be used in case we want counts as aggregates,
but probably not optimal.
If the aggregation capabilities are going to be very unnatural/heavy
for ES, we might as well need to go with some other alternative
(Mongo/Couch or maybe MySQL, don't know), and use ES only for search,
or just full-text search.
Facets are a natural part of ES and Lucene. Depending on your data and
your queries, they may use quite a lot of memory. Without concrete
examples, its difficult to say whether it is a good match or not, but
based upon what you've said so far, I would think that ES fits in very
well with your requirements
clint
Would appreciate any help on the above.
Regards,
--Parikshit N. Samant.