Hi,
I have a few questions about time series data streams (TSDS). I've tried looking through the documentation but am still confused on a few small things
I was wondering if fields that are not stored as dimensions ("time_series_dimension": true) or as a metric (ex: "time_series_metric": "gauge") were still stored more efficiently than in a normal datastream?
I.e. if I have the mappings
{
"properties": {
"Dimension1": {
"type": "keyword",
"time_series_dimension": true
},
"Dimension2": {
"type": "keyword",
"time_series_dimension": true
},
"metric1": {
"type": "integer",
"time_series_metric": "gauge"
},
"other": {
"type": "keyword"
}
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time"
}
}
}
Would the "other" keyword be stored more efficiently than just using a normal data stream? If they are not stored as efficiently, then should I make every non-numeric field into a dimension? In this example "other" keyword would be a keyword that will only have 5 possible values - and I would like to keep count of logs with those values over time.
Additionally I had one other question. I have some logs I'm turning into metrics in which some are going to have the exact same dimensions and @timestamp. I know that TSDS stores _id as a hash of dimensions and @timestamp - but this poses a problem that it is counting some of my logs (around every 1000th) as duplicates as they have the same @timestamp (to the millisecond) and dimensions. Is there any solution that allows me to prevent the TSDS from automatically preventing duplicate documents besides hard coding a solution into a dimension through something like logstash?
Thanks!