Hello,
We have an index which has multiple documents with the same phone number.
Each document will always contain the phone number and may contain additional information (see example below)
The documents are written into the index constantly.
For a certain time window, for example 1 minute, we would like to have only one record for each unique phone number.
This data is sent via Logstash into an S3 bucket.
We do not care which record will be selected (first, last, random) but we need only one.
Input index example:
phone no | time | data | address | … | … |
---|---|---|---|---|---|
50-5325471 | 1:10:50 | A | |||
50-5325471 | 1:10:51 | B | |||
50-5325471 | 1:10:52 | C | |||
55-6789345 | 1:10:50 | A | |||
55-6789345 | 1:10:53 | B | |||
57-3434345 | 1:10:55 | C | |||
50-5325471 | 1:15:50 | D | |||
50-5325471 | 1:15:55 | E | |||
55-6789345 | 1:15:50 | F |
Output data example:
phone no | time | data | address | … | … |
---|---|---|---|---|---|
50-5325471 | 1:10:50 | A | |||
55-6789345 | 1:10:50 | A | |||
50-5325471 | 1:15:55 | E | |||
55-6789345 | 1:15:50 | F |
The idea is to have a query that will filter one record per phone number for each Logstash iteration.
Can someone please help on the way to do that.
Thanks
From: Moshe Sharon <moshe.sharon@cellwize.com>
Sent: Wednesday, 7 September 2022 12:02
To: Tomer Bruchiel <tomer.bruchiel@cellwize.com>
Subject: Please check and correct
Logstash Filter problem
We have an index which has multiple documents with the same phone number.
ex: table showing index example
phone no | time | name | address | … | … |
---|---|---|---|---|---|
50-5325471 | |||||
50-5325471 | |||||
50-5325471 | |||||
55-6789345 | |||||
55-6789345 | |||||
57-3434345 | |||||
50-5325471 | |||||
50-5325471 | |||||
55-6789345 |
I want Logstash to write as output the latest(or first) occurrence of same phone no from every batch (same phone no returning one after the other)
thanks
Moshe
sharon.moshe@gmail.com