Hi There,
We are looking into solutions of storing and searching time-series of data and Elasticsearch comes up as one of the candidates. After reading and thinking of our domain, I have a question regarding to the mapping and type managements.
In short is it better to have multiple types with single data point for each search or single type with all the data points?
an example is,
1>
HostCPU {
name: string; //hostName
value: float; //cpu usage
timestamp: date; //collection time
}
HostMemory{
name: string; //hostName
value: float; //memory usage
timestamp: date; //collection time
}
.....
other types of interests
........
or
2>
Host{
name: string; //hostName
cpuUsage: float;
memoryUsage: float;
...otherDataOfInterest..
timestamp: data
}
With option 1> we would just retrieve CPU or memory data as needed; with option 2> you would always get all data even if user is only interested in one of the data points.
With option 2> the number of documents would be much less and the duplication of data is less, hence less footprints as well but more I/O when not all data is needed. also when multiple data points are needed, one search vs. multiple searches.
There are thousands of hosts to collect and search data for. For each host we have 20 or so datapoints of interets.
I'd appreciate any feedback and any pointers to some design principles to keep in mind in regards to number of indices, number of types, number of documents and any hard limits on those.
Thanks
Jasmine