Google_workspace poll loads of data

Hi,
Using filebeat 7.12 and 7.11.2 google_workspace does not respect the var.initial_interval so it polls huge amount of data and also polls it multiple times confirmed from Google API dashboard too.

> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:57|template execution failed: template: :1:9: executing  at <.cursor.last_execution_datetime>: map has no entry for key last_execution_datetime|{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |---|---|---|---|---|---|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:60|template execution: falling back to default value|{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:57|template execution failed: template: :1:12: executing  at <now>: wrong type for value; expected string; got time.Time|{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:70|template execution: evaluated template |{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:70|template execution: evaluated template |{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:57|template execution failed: template: :1:16: executing  at <.last_response.body.nextPageToken>: map has no entry for key nextPageToken|{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
> |2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:70|template execution: evaluated template |{input_source: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin, input_url: https://www.googleapis.com/admin/reports/v1/activity/users/USER_AT_DOMAIN/applications/admin}|
1 Like

I tried the http-json input separately and it does not save the cursor last_execution_datetime

1 Like

I'm seeing the same behavior with a clean install of Filebeat 7.12 and the google_workspace module.

1 Like

Seeing the same behaviour. Ingesting the same events thousands of times.
Had one event 16,000 times before I checked and killed the input.

We're aye seeing the same thing. If you have 100,000 accounts in your domain with hundreds of millions of events over the six-month period, it can take days for a single pass to finish. We use fingerprint against the Google event ID to drop duplicates so we don't have huge indexes but the run-time is massively problematic since it doesn't respect the interval value.

Hey all! Thanks for the report on the issue, I will check this tomorrow and come back with a possible workaround while we work on a fix if we are able to reproduce the issue, seeing as many people have reported it, that should be quite likely.

The debug messages reported by @mkorayem is mostly just debug however, I do wonder if you are only getting this error once or not?:

|2021-03-24T20:32:54.452+0200|DEBUG|[input.httpjson-cursor]|v2/value_tpl.go:57|template execution failed: template: :1:9: executing at <.cursor.last_execution_datetime>: map has no entry for key last_execution_datetime|

The first time the beat runs, last_execution_datetime does not exist, because it retrieves this from the first response, and will then default back to the value you specify on initial_interval, however it seems that it either does not pick up the correct timestamp on the first response, or that it is stuck paginating the same page(s), which would explain the duplicate entries.

Hey all, just to confirm, the fixes for this has been merged for 7.12.1 and 7.13, if anyone would want I can also provide a snapshot build of 7.12.1 if needed to test.

The relevant PRs: