Filebeat (google_workspace module) retrieving partial events on initial startup

Hello I would like to ask question related to below topic.

I want to retrieve events backtracking up to 96 h for drive events .
However, I was only able to retrieve partial of them .

Looking at DEBUG log , the startTime was replaced by different value in the follow-on request .

  • Initial polling (startTime=2022-03-26T14%3A33%3A42Z)
2022-03-30T14:33:42.912Z        INFO    [input.httpjson-cursor] compat/compat.go:111    Input httpjson-cursor starting  {"id": "BBFA557C8D443451"}
2022-03-30T14:33:42.914Z        DEBUG   [input.httpjson-cursor] v2/cursor.go:27 new cursor: nothing loaded      {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.914Z        INFO    [input.httpjson-cursor] v2/input.go:112 Process another repeated request.       {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.914Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:84      template execution: falling back to default value       {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.914Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:97      template execution: evaluated template "2022-03-26T14:33:42Z"   {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.914Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:97      template execution: evaluated template "2022-03-26T14:33:42Z"   {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.914Z        DEBUG   [input.httpjson-cursor] v2/request.go:77        new request: v2.transformable{"header":http.Header{"Accept":[]string{"application/json"}, "User-Agent":[]string{"Elastic-Filebeat/7.17.1 (linux; amd64; 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff; 2022-02-23 23:38:04 +0000 UTC)"}}, "url":(*url.URL)(0xc00080d3b0)}    {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:42.916Z        DEBUG   [input.httpjson-cursor.retryablehttp]   go-retryablehttp@v0.6.6/client.go:504   performing request%!(EXTRA string=method, string=POST, string=url, *url.URL=https://oauth2.googleapis.com/token)   {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:43.088Z        DEBUG   [input.httpjson-cursor.retryablehttp]   go-retryablehttp@v0.6.6/client.go:504   performing request%!(EXTRA string=method, string=GET, string=url, *url.URL=https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive?startTime=2022-03-26T14%3A33%3A42Z)        {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}```
  • Polling for page2 (startTime=2022-03-30T14%3A33%3A46Z)
2022-03-30T14:33:46.758Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:97      template execution: evaluated template "2022-03-30T14:33:46Z"   {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:46.758Z        DEBUG   [input.httpjson-cursor] v2/cursor.go:56 cursor.last_execution_datetime stored with 2022-03-30T14:33:46Z {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:46.759Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:97      template execution: evaluated template "2022-03-30T14:33:46Z"   {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:46.759Z        DEBUG   [input.httpjson-cursor] v2/value_tpl.go:97      template execution: evaluated template "A:1648627960313000:-5113882487620209611:777491262838:C04ha7adf" {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:46.759Z        DEBUG   [input.httpjson-cursor] v2/request.go:77        new request: v2.transformable{"header":http.Header{"Accept":[]string{"application/json"}, "User-Agent":[]string{"Elastic-Filebeat/7.17.1 (linux; amd64; 1d05ba86138cfc9a5ae5c0acc64a57b8d81678ff; 2022-02-23 23:38:04 +0000 UTC)"}}, "url":(*url.URL)(0xc004fd82d0)}    {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}
2022-03-30T14:33:46.759Z        DEBUG   [input.httpjson-cursor.retryablehttp]   go-retryablehttp@v0.6.6/client.go:504   performing request%!(EXTRA string=method, string=GET, string=url, *url.URL=https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive?pageToken=A%3A1648627960313000%3A-5113882487620209611%3A777491262838%3AC04ha7adf&startTime=2022-03-30T14%3A33%3A46Z)       {"id": "BBFA557C8D443451", "input_source": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive", "input_url": "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/drive"}

As described in the doc , I believe the parameters except the pageToken should be the same as initial request when paginating.

In your follow-on request getting the next page of the report, enter the nextPageToken value in the pageToken query string.

Would there be anyway to set same values for the startTIme on all pagination requests ?

Filebeat I used is 7.16.1 and configuration used for my testing is on below gist.

Any advice is appreciated !

Thanks,
Yu watanabe

@Marius_Iversen seems like the same issue I was experiencing with the Atlassian integrations.

@YuWatanabe when looking at the debug logs, the events that are returned from the API, are the timestamps after the new start time or so they look like they're picking up where page 1 ended? I'm trying to gauge if the Google API ignores the start time parameter if the next page token is present.

@legoguy1000

Looks like it's this. Published event in page2 is before startTime .

so they look like they're picking up where page 1 ended?

  • published event in page 2 , @timestamp": "2022-03-30T08:12:40.313Z"
  • request param for page 2 , &startTime=2022-03-30T14%3A33%3A46Z

I see the log like this.

These are the first 10 @timestamp s in the debug log.

[ssm-user@ip-10-0-10-12 filebeat]$ grep @timestamp filebeat | head -10
  "@timestamp": "2022-03-30T14:31:33.368Z",
  "@timestamp": "2022-03-30T14:27:08.492Z",
  "@timestamp": "2022-03-30T14:27:07.676Z",
  "@timestamp": "2022-03-30T14:23:32.720Z",
  "@timestamp": "2022-03-30T14:23:28.352Z",
  "@timestamp": "2022-03-30T14:23:18.592Z",
  "@timestamp": "2022-03-30T14:19:40.606Z",
  "@timestamp": "2022-03-30T14:15:09.675Z",
  "@timestamp": "2022-03-30T14:12:08.090Z",
  "@timestamp": "2022-03-30T14:02:48.930Z",
[ssm-user@ip-10-0-10-12 filebeat]$

These are the last 10 @timestamp s in the debug log.

[ssm-user@ip-10-0-10-12 ~]$ grep @timestamp filebeat/filebeat | tail -10
  "@timestamp": "2022-03-30T08:13:30.641Z",
  "@timestamp": "2022-03-30T08:13:26.285Z",
  "@timestamp": "2022-03-30T08:13:11.725Z",
  "@timestamp": "2022-03-30T08:13:10.832Z",
  "@timestamp": "2022-03-30T08:13:10.065Z",
  "@timestamp": "2022-03-30T08:13:01.041Z",
  "@timestamp": "2022-03-30T08:12:57.216Z",
  "@timestamp": "2022-03-30T08:12:56.669Z",
  "@timestamp": "2022-03-30T08:12:56.669Z",  <- last published event in page 1
  "@timestamp": "2022-03-30T08:12:40.313Z",  <- single event  published in page 2

Sorry, can u actually look at the timestamp field from the Google event data, not filebeat's @timestamp? That field won't be properly updated until it's in Elasticsearch.

@legoguy1000
Yeah . Understood. I believe your talking about the id.time

[ssm-user@ip-10-0-10-12 filebeat]$ grep -oP '\\"id\\":{.*?}' filebeat | sed -e "s/CUSTOMERID/MASKED/g" | head -10
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:31:33.368Z\",\"uniqueQualifier\":\"-8678004808945565637\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:27:08.492Z\",\"uniqueQualifier\":\"7801333792512993495\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:27:07.676Z\",\"uniqueQualifier\":\"3073723659828988113\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:23:32.720Z\",\"uniqueQualifier\":\"7398511666600379244\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:23:28.352Z\",\"uniqueQualifier\":\"-4403104155775101877\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:23:18.592Z\",\"uniqueQualifier\":\"5667666403298765403\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:19:40.606Z\",\"uniqueQualifier\":\"-1473095349496659424\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:15:09.675Z\",\"uniqueQualifier\":\"4920740105111875611\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:12:08.090Z\",\"uniqueQualifier\":\"-1230078526531462997\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T14:02:48.930Z\",\"uniqueQualifier\":\"-6261765094560388394\"}
[ssm-user@ip-10-0-10-12 filebeat]$
[ssm-user@ip-10-0-10-12 filebeat]$
[ssm-user@ip-10-0-10-12 filebeat]$ grep -oP '\\"id\\":{.*?}' filebeat | sed -e "s/CUSTOMERID/MASKED/g" | tail -10
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:30.641Z\",\"uniqueQualifier\":\"-6682738832970861995\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:26.285Z\",\"uniqueQualifier\":\"-154616765354559012\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:11.725Z\",\"uniqueQualifier\":\"4496001752976791343\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:10.832Z\",\"uniqueQualifier\":\"-6706812910553500356\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:10.065Z\",\"uniqueQualifier\":\"681135857486310931\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:13:01.041Z\",\"uniqueQualifier\":\"-8521522634179346711\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:12:57.216Z\",\"uniqueQualifier\":\"1891074331260643701\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:12:56.669Z\",\"uniqueQualifier\":\"-2307473763742687566\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:12:56.669Z\",\"uniqueQualifier\":\"-2307473763742687566\"}
\"id\":{\"applicationName\":\"drive\",\"customerId\":\"MASKED\",\"time\":\"2022-03-30T08:12:40.313Z\",\"uniqueQualifier\":\"-5113882487620209611\"}
[ssm-user@ip-10-0-10-12 filebeat]$

I've updated the snippet as well for your reference.

Just looking at what u posted, it seems the API is ignoring the start time parameter when the page token is present. So I think ur ok and there is no issue but can't say for certain with the truncated logs.

Thank you for the reply.

This is a problem for us because filebeat does not return expected number of events in initial startup.

If you use google sdk , it crawls all the way to last 24 hours. I have attached the code in above gist.

  • go sdk returns 6,231 events, total 7 pages for last 24 hrs
  • python sdk returns 6,225 events, total 7 pages for last 24 hrs
  • filebeat returns 1,143 events, total 2 pages

So I thought I am missing some parameter in configuration to work this out but not sure how.

I also attached example debug output including full original event.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.