Optimizing Elasticsearch for Timeseries data


(Spiral Circ) #1

Hi,

We use elasticsearch to store financial stock data, and was wondering if there is any optimizations that can be done to improve the performance w.r.t timeseries data.

We have around 3 million documents on a 2 node cluster. We have a field in the documents called "Time" which holds date+time value till milliseconds in epoch format, ex: 1448841600000.

Our basic use-case is to fetch the given number of the documents going back in time. Ex: Fetch the latest 1000 documents. This takes 5 seconds currently.

I was wondering if there is any tweaking that can be done so that elasticsearch optimizes the way it stores data for such usecases ?

Our current mapping for "Time" field:

"Time": {
"type": "long",
"fielddata": { }
},

Any help is appreciated.

Thanks,
Srikanth


(Mark Walkom) #2

Check out https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store to start.

Are you using time series indices?


(Spiral Circ) #3

Warkolm,

Thanks for the link, we kind of have similar mapping on our side.
By time series indices if you mean having different indices based on date/time, we dont. We have one index which holds all the data. Our usecase is to fetch the latest N documents irrespective of the date they below to.

Thanks,
Srikanth


(Mark Walkom) #4

That's not ideal, I would change that.


(system) #5