Single or multiple types


#1

Hi There,

We are looking into solutions of storing and searching time-series of data and Elasticsearch comes up as one of the candidates. After reading and thinking of our domain, I have a question regarding to the mapping and type managements.

In short is it better to have multiple types with single data point for each search or single type with all the data points?

an example is,
1>
HostCPU {
name: string; //hostName
value: float; //cpu usage
timestamp: date; //collection time
}

HostMemory{
name: string; //hostName
value: float; //memory usage
timestamp: date; //collection time
}

.....
other types of interests
........

or
2>
Host{
name: string; //hostName
cpuUsage: float;
memoryUsage: float;
...otherDataOfInterest..
timestamp: data
}

With option 1> we would just retrieve CPU or memory data as needed; with option 2> you would always get all data even if user is only interested in one of the data points.

With option 2> the number of documents would be much less and the duplication of data is less, hence less footprints as well but more I/O when not all data is needed. also when multiple data points are needed, one search vs. multiple searches.

There are thousands of hosts to collect and search data for. For each host we have 20 or so datapoints of interets.

I'd appreciate any feedback and any pointers to some design principles to keep in mind in regards to number of indices, number of types, number of documents and any hard limits on those.

Thanks
Jasmine


(Jason Tedor) #2

Not directly an answer to your question, but the definitive writing on types is Index vs. Type. I hope that it helps you understand how to think about these sorts of issues.


(Mark Walkom) #3

Maybe, compression would help with #1

Basically it's going to be a case of try both and see what works :slight_smile:


#4

Thanks Jason,

We are planning to use one index ( per day) with many types.

We are uncertain with "many" different types, each with fewer data points, so the found documents would only contain the necessary data point; or less types, each with more data points, so the search would return all data and up to the consumer to pick out what data points are needed.


(Mark Walkom) #5

Be careful - https://www.elastic.co/guide/en/elasticsearch/reference/2.2/breaking_20_mapping_changes.html#_conflicting_field_mappings


#6

Thanks Mark, it's very useful to be aware of the pitfalls.

My main concern of the multiple types vs. single type catching all is about storage and performance. From what you said previously, both are reasonable approaches pending on usage and can only find out by prototype and benchmarking.. On paper (in theory) it's not black or white with either option 1 or 2 from the expert's point of view.


(system) #7