Users of ES-Hadoop can specify one of their fields to be used as the document’s ID, and Elasticsearch manages ID based writes consistently. We can’t know ahead of time what your documents’ IDs are, so it’s up to each user to ensure their streaming data contains an ID of some sort.
Are there some best practices on how to generate the ID for time series data and to optimize for ingest? I'm starting by introducing UUID v4, but seeing that there are better implementation. Such as one and two.
I already understand that using auto-generated ID will skip duplicate check, thus saving lookup cost. Here I'm looking for an ID generation scheme for exactly-once guarantee and ingest performance.