Let's say I have to indices, index_poll and index_latest.
Index_poll gets metrics data from devices using logstash and beats. When data is ingested into index_poll, I want this to trigger the data ingestion/updation of the same documents into index_latest.
What am I trying to achieve?
index_poll will have time series data. It can be a data stream as well, which would be the best case. index_latest will have the latest data (last 5 mins data in index_poll).
That is not possible using ingest pipelines. You could however write to multiple indices using Logstash. Why do you want to do this? What problem are you trying to solve?
In ES, rather than getting the data from a particular device for a particular poll in 1 doc (1 doc per device for a poll), I am receiving in multiple documents, like below. Let the data stream be device-metrics.
{
device: 1,
sensorDetails: {...},
...
}
{
device: 1,
userDetails: {...},
...
}
...
So my aim is to create an index, say latest-device-data, which will always have 1 document per device, with the latest values from device-metrics datastream.
So all the data from the devices should be there in device-metrics datastream and also each time a data is ingested into device-metrics data stream, the same should be routed to the latest-device-data index
OK. Then I think you need to add Logstash between Metricbeat and Elasticsearch so you can either use the clone filter to generate another copy of the event for the other index and set the document ID so it is correctly updated OR have two separate Elasticsearch outputs.
Is there any specific reason why the data breaks into different docs? It would be great if the data from a particular device for a particular poll would be in a single document (ideally as it should be), rather than workarounds which won't be feasible for large amount of data.
I tried to create script processors to group documents, but in vain.
Becoz of this, when I create dashboards, some of the visualizations are empty, esp when filters are applied on a particular field (becoz that field may not be a part of all docs).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.