Couple weeks ago, we’ve noticed that some Google Workspace logs received by Filebeat got duplicated.
I’ve searched the internet for possible cause and find one similar issue here at Elastic Discuss, Google Workspace module using wrong field to avoid duplicates telling that "json.id.time", "json.id.uniqueQualifier", "json.id.applicationName", "json.id.customerId" are used to generate the _id.
After updating Filebeat to the most recent version (8.10.2 run from docker.elastic.co/beats/filebeat:8.10.2) I found that the same issue has different _id.
Here are two examples happens few minutes ago.
First:
{
"_index": "google_ws-2023.10.04",
"_id": "4rrh-YoBiE7xzynem1Cm",
"_source": {
"json": {
"id": {
"time": "2023-10-04T08:50:13.677Z"
},
"etag": "\"rQ3qpTrpjMqlOD9Fi6ZCgnpo6zAdUtM4Y4wU0J6c8Yw/UiNqGB-f4anaOLIVD9ya9Z-pAP0\"",
"events": {},
"actor": {}
},
"event": {
"id": "-8909398197392254316",
"created": "2023-10-04T08:50:25.347Z",
"original": "{\"id\":{\"applicationName\":\"drive\",\"customerId\":\"C00hvn0vt\",\"time\":\"2023-10-04T08:50:13.677Z\",\"uniqueQualifier\":\"-8909398197392254316\"}"}"
},
"@timestamp": "2023-10-04T08:50:13.677Z",
},
}
Second:
{
"_index": "google_ws-2023.10.04",
"_id": "QePm-YoBq7bjVLXLMFU_",
"_source": {
"json": {
"id": {
"time": "2023-10-04T08:50:13.677Z"
},
"etag": "\"rQ3qpTrpjMqlOD9Fi6ZCgnpo6zAdUtM4Y4wU0J6c8Yw/UiNqGB-f4anaOLIVD9ya9Z-pAP0\"",
"events": {},
"actor": {}
},
"event": {
"created": "2023-10-04T08:55:25.376Z",
"original": "{\"id\":{\"applicationName\":\"drive\",\"customerId\":\"C00hvn0vt\",\"time\":\"2023-10-04T08:50:13.677Z\",\"uniqueQualifier\":\"-8909398197392254316\"}"}"
},
"@timestamp": "2023-10-04T08:50:13.677Z",
}
}
I.e. there is no uniqueQualifier, applicationName, customerId under the “json.id” key, as supposed to be, while they all still exists under the “event.original.id” key.
So could you please tell how this can be fixed?