Hello, i'm want to import CVE Entrys from a feed.
The feed consists of several Files, which gets generate every hour.
In each File is a list of CVE Entrys. Most of the newly generated CVE Entrys(JSON), will get enriched with data (like CPE, Advisories...). That means in the future a new file will be generated, which contains an already published CVE-Entry (just with new values in the fields).
I used the Ids for the CVE Entrys as the "_id" for elasticsearch.
When i import all the files with a python tool, i get the following result in the index stats:
When you add up, the (docs:count)+(docs:deleted) you get exactly the number of CVE-entrys that have been fetched. Which is reasonable because most of the CVE-Entrys are available 2 times in the feed.
I want to reduce the number of docs:deleted .
I Import the CVE-entrys with python by putting them in a bulk like this:
cve_bulk={
"_op_type": "update",
"_index": index,
"_id": cve["vuln_id"],
"doc_as_upsert": True,
"doc":cve
}
In Order to reduce the number of deleted docs, i wanted to change the bulk and add a script to it in case a CVE-Entry get fetched for the second time, so i only change the fields from which i know they could be changed and leave the other fields untouched. But after a bit of research i understood: since i'm using _op_type:"update" a new doc will be created and the old deleted.
This brings me to my question: Is there anyway to change the values of fields of a certain doc without deleting the doc itself, so that in the stats i will get docs:deleted=0 ?
If that's not possible could my current situation impose a performance issue ?
KInd Regards.