Using Elastic search is an option or not?

akriti · August 10, 2016, 5:07am

Hi,

I am new to Elastic Search and was considering to use it. I have some doubts and would appreciate if someone can please help me with them.

A brief introduction of my problem statement:

I have to store and analyse a huge amount of collected data and generate reports/charts to determine different trends/frequency of usage for certain data etc.
Frequency of incoming data is high and over a period of time, the stored data will grow huge. In my understanding, Elastic Search allows aggregations at search time which may cause the report generation to be slow if the amount of data is very large (historical data). We want an alternative to this search time aggregation (Stored aggregations like in a data warehouse) in Elastic Search which in the log run (with historical data) will not slow down the search. Is it possible with Elastic Search to achieve this or using a data warehouse is a better approach in this case?

Any thoughts will be helpful. Thanks in advance.

dadoonet · August 10, 2016, 6:57am

How slow it is? Did you test it?

Christian_Dahlqvist · August 10, 2016, 7:00am

How much data are looking to collect and analyse? Huge is a very subjective term, and does not really tell us much.

akriti · August 10, 2016, 10:54am

I have not tested it as of now as right now, I am in the phase of choosing either Elastic Search or a data ware-house whichever will best suit my needs. Just an example, maybe I get 1000 entries from 270 data sources each day. After say, 10 years we will have a considerable amount of data. When I make my search on this amount of data (this is growing day by day), will Elastic Search do justice to the report generation(quick results as it will do search aggregation on such big data) or should I consider Data ware-house for my problem?

Christian_Dahlqvist · August 10, 2016, 11:11am

Elasticsearch is designed to scale horizontally with increasing invest or total data volume. It is common for even small clusters to ingest hundreds of GB per day, so the volumes you are describing sounds quite small from an Elasticsearch perspective.

akriti · August 10, 2016, 12:16pm

Thanks for the reply. Data I mentioned is just an example & not the real data. I agree Elastic Search can ingest hundreds of GB per day but, will it make the report generation slower after a considerable period of time? And, I wanted to avoid aggregations happening at the time when we ask for reports rather I would prefer if the data is aggregated beforehand and then, when we say report generation for filtering some data, at that time it does not compute and use pre-populated result like in data-warehouse (dimensions). So, help on this path will be appreciable.
Thanks.

Christian_Dahlqvist · August 10, 2016, 12:29pm

You can query and aggregate against large volumes of data with good performance in Elasticsearch. Exactly how long this will take will however depend on the type of data you have, what type of queries/aggregations you run as well as the type and amount of hardware available. I know users that aggregate data in Elasticsearch in order to make certain type of queries faster, but I also know users that just run queries against the raw data and see good performance. When you aggregate you will lose some information, which may make it harder to later ask questions you did not realise you needed to ask when you determined how to aggregate.

If you are looking at a 10 year perspective, Elasticsearch as well as hardware will most likely develop quite a lot in that time, meaning that performance characteristics we see today may no longer apply.

akriti · August 11, 2016, 5:42am

Is there any way in ES that would allow to store the counts for queries and later when we want some results involving them, rather than going for calculating all of it, we use the stored counts and only calculate what is remaining? I am using Kibana with ES for reporting/charting and trying to calculate some statistics for the collected data.

dadoonet · August 11, 2016, 8:10am

If you run a query, you get back a JSON document. Send it to Elasticsearch with a PUT.

But again I'd not try to solve problems I don't have.

Topic		Replies	Views
Aggregations for charting on 500 billion documents Elasticsearch	3	918	April 17, 2017
Use case Elasticsearch	8	315	July 6, 2017
Using ES as an alternative for a data warehouse Elasticsearch	4	13218	July 6, 2017
Evaluating ElasticSearch: Is it possible to run multiple value aggregations on ~100M records? Elasticsearch	7	832	July 6, 2017
Aggregations large data in real time in Elasticsearch. What solution is the best? Elasticsearch	11	1060	April 3, 2020

Using Elastic search is an option or not?

Related topics