Is it possible?

Can someone answer me this.

I have a database with several configurations, and each configuration can be linked to the same serial_number. To be able to analyze the data correctly, I need to filter så that each serial_number is grouped as one configuration.
So can I use an ingest pipeline to set up condition to say " if there is some document here with the same serial_number, only ingest the newest one of each"?

Is anything possible with these processors and the painless language?

Can you elaborate more on why you only want the latest one stored?

1 Like

the configurations get stored in the database when a user press save in the program, and if the user presses save multiple times the same configuration get's stored again. This can happen if a user is inexperienced or forgets a setting before saving. So it's a incorrect design of the system I guess, they admit this, but to get accurate statistics from the database they want to filter the redundant data out.

You have two options;

  1. Keep all state changes and only provide the latest, using a top hits aggregation. This lets you track changes over time and do analysis on them
  2. Use the serial as a document ID and then it'll overwrite with the latest each time
1 Like

Thank you very much, I don't understand what state changes are, but at least I know it is possible.

State change means every time the config is changed, ie it goes from one state to another.

1 Like

Hi, can I ask you to elaborate on the first option more? I got hired to implement elasticsearch so I need more information on how to move forward.

Elaborate in what sense?

I'm not sure if I understand the concept of state changes yet. There are several configurations with the same serial_number, how would their state change? And would you do still do this with ingest pipelines? What processor?

I guess it is not possible to both keep all documents and also use just one of each serial number to analyze statistics?

Basically you want to use Elasticsearch as a time series datastore.
Where each event comes in with a timestamp and serial and whatever else is logged. Then you can graph changes over time, either at an individual serial level, or on an aggregated level.

If you want to retrieve only the latest event, which contains whatever state is logged, then you can do that easily.