I have a custom Beat publishing documents for a number of "system" types (ex. Web Server, App Server, Client Device, Database Server, etc.). Each system type can have tens, hundreds or thousands of instances (ex. Database Server 1, Database Server 2, ..., Database Server 2500). Each document the Beat publishes contains several hundred fields, including an "InstanceId" field (uniquely identifying the system) and a "Created Time" field (the time the system sent the data to the Beat).
I want to have 2 separate indices for each "system" type being published. The first index, the "All" index, contains every document received for the day for the system type and would typically contain tens of millions of records. The second index, the "Latest" index, contains only the latest record for each "InstanceId" within the type. The reason for this approach is the "Latest" index will be queried/aggregated on a per second basis and performing the queries/aggregations against a few thousand records will be much more efficient than performing the queries/aggregations against tens of millions of records. However, each record needs to be available (in the "All" index) for data analysis and reporting.
I am trying to determine the most efficient way to clone an incoming document for the "All" index to the "Latest" index, so that an Upsert occurs on the "latest" index, but only if the "Created Time" field on the incoming document is later than the "Created Time" field for the matching document (based on "InstanceId") on the "Latest" index.
The incoming document would not have the "Id" field set so that a new record is created in the "All" index for every document received. The clone process would need to copy the "InstanceId" field into the "Id" field of the cloned document so that the Upsert occurs in the "Latest" index.
I could obviously do the duplication on the Beat side, but that would result in twice as much data being transferred over the wire to Elasticsearch, which would be not be possible given data transfer constraints. I'm currently publishing from the Beat directly to Elasticsearch, generating the "All" index. Is it possible to accomplish the stated goal solely in Elasticsearch, or do I need to introduce Logstash in to the process. If I do need Logstash, is there an existing plugin that could accomplish the stated goal, or would I need to author a custom plugin?
Thanks for any insight/suggestions.