So we have succeeded in putting in a process which can ingest one line csv logs. Lots of them. Now the next stage is to put in some sort of anomaly detection. However here is the problem I am facing.
I am not a domain expert and the person who is going to be end user says that "all" values in csv are important to him. And the csv has around 75+ columns.
I cannot see myself making 75+ population anaysis based on serial number and one field each.
Is there an option to monitor all the fields in an event for a given serial number?
A sample of the data I have:
{
"_index": "logstash-starkindustries-2021.01",
"_type": "_doc",
"_id": "fgibhjdgsfib356",
"_version": 1,
"_score": null,
"_source": {
"serial": 45689457890,
"sensor_A": 76.5,
"sensor_B": 65.9,
"sensor_C": 608,
"sensor_D": 200,
"sensor_E": 20,
"sensor_F": 65,
....
....
....
....
"Result": "Pass",
"Time_Taken": 56,
}
Each serial number has one csv log of one line exactly. And hence only one event in the index against it.
Now I have created a population analysis based on the Time taken but there are so many other fields which are of interest. And any one showing something out of ordinary can be very valuable.
The processing is done in logstash so if there is any suggestion to change the way I am ingesting data from csv then I will be happy to implement it.