How to know the _id of a record externally?

hsluoyz · October 5, 2018, 4:34pm

I'm developing a log analysis system. The input are log files. I have an external Python program that reads the log files and decide whether a record (line) or the log files is "normal" or "malicious". I want to use Elasticsearch Update API to update the detection result ("normal" or "malicious") to Elasticsearch's index by adding a new column called "result". So I can see the result clearly via Kibana UI.

Simply speaking, my Python code and Elasticsearch both use log files as input respectively. Now I want to update the result from Python code to Elasticsearch. What's the best way to do it?

I can think of several ways:

Elasticsearch automatically assigns a ID (_id) to a record. If I can find out how Elasticsearch calculates _id, then my Python code can calculate this id for each record, then update the record via _id. But the question is, the docs doesn't say about the algorithm to generate _id.
Add ID (like line number) to the log files. Then use this ID to update. But I think I have to search for this ID for every time because it's only a normal field instead of a built-in _id. The performance will be very bad.
My Python code gets the logs from Elasticsearch instead of reading the log files directly. But this makes the system fragile, as Elasticsearch becomes a critical point. The performance will also be degraded.

So the first solution will be ideal in the current view. Any suggestions? Thanks.

s1monw · October 12, 2018, 11:51am

when the indexing request returns we also return you the _id there is no deterministic way to recalculare the ID. We take things like mac address and wall clock time into account

hsluoyz · November 8, 2018, 2:53am

Thanks!

I finally used the 2nd way: add my own ID to each line of the log file. Then use this ID in both Elasticsearch and my Python program. Then Python can update the document based on the ID.

system · December 6, 2018, 2:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practice for handling _ids in get and search results Elasticsearch	6	862	December 28, 2021
Using external document_id Logstash	9	1013	April 2, 2020
Elasticsearch with python Elasticsearch	7	996	July 6, 2017
How to generate _id field manually Logstash	9	2808	August 6, 2020
How does logstash generate '_id' when the outputting to elastic search? Logstash	2	7299	July 6, 2017

How to know the _id of a record externally?

Related topics