I'm developing a log analysis system. The input are log files. I have an external Python program that reads the log files and decide whether a record (line) or the log files is "normal" or "malicious". I want to use Elasticsearch Update API to update the detection result ("normal" or "malicious") to Elasticsearch's index by adding a new column called "result". So I can see the result clearly via Kibana UI.
Simply speaking, my Python code and Elasticsearch both use log files as input respectively. Now I want to update the result from Python code to Elasticsearch. What's the best way to do it?
I can think of several ways:
Elasticsearch automatically assigns a ID (_id) to a record. If I can find out how Elasticsearch calculates _id, then my Python code can calculate this id for each record, then update the record via _id. But the question is, the docs doesn't say about the algorithm to generate _id.
Add ID (like line number) to the log files. Then use this ID to update. But I think I have to search for this ID for every time because it's only a normal field instead of a built-in _id. The performance will be very bad.
My Python code gets the logs from Elasticsearch instead of reading the log files directly. But this makes the system fragile, as Elasticsearch becomes a critical point. The performance will also be degraded.
So the first solution will be ideal in the current view. Any suggestions? Thanks.