OSQuery Live Queries don't go through

Hey,

I'm using a 7.13 stack with 7.13 Elastic Agents on multiple machines. I have the OSQuery Manager Integration installed (v. 0.2.3) and applied a respective policy to three Agents (2x Ubuntu 20.04, 1x Windows Server 2019).

When I run live queries, they either don't reach the targets or don't come back to Kibana (stuck on pending) - I'm not sure what's happening and would need some assistance as to where to start debugging.

On one Linux host, queries sometimes work, on the other two hosts they have never come back successfully, ever. I'm using the simple query "SELECT * FROM os_version;".
I have confirmed that osqueryd and osquerybeat are running on the hosts. The hosts also appear "green" in the live query host selector.

Some log files from the host that does work occasionally:

<install_path>/data/elastic-agent-054e22/install/osquerybeat-7.13.0-linux-x86_64/osquery/o squeryd.INFO

I0527 17:27:18.444630 702059 eventfactory.cpp:156] Event publisher not enabled: BPFEventPublisher: Publisher disabled via configuration
I0527 17:27:18.444713 702059 eventfactory.cpp:156] Event publisher not enabled: auditeventpublisher: Publisher disabled via configuration
I0527 17:27:18.444742 702059 eventfactory.cpp:156] Event publisher not enabled: inotify: Publisher disabled via configuration
I0527 17:27:18.444759 702059 eventfactory.cpp:156] Event publisher not enabled: syslog: Publisher disabled via configuration

there's nothing else in this logfile.

<install_path>/data/elastic-agent-054e22/install/osquerybeat-7.13.0-linux-x86_64/osquery/osqueryd.results.log is empty on every host.

Thanks @nemhods for raising the issue. Sorry for your unexpected experience with osquery manager.

Could you try running a "Select * from users;" to see if anything comes back from that?

@nemhods thank you for giving our first beta a try.

There are couple of files that can give us more information on what's going on, the elastic-agent log and the osquerybeat log, should be in data directory, something like:
data/elastic-agent-054e22/logs/elastic-agent-json.log-*
and
/data/elastic-agent-054e22/logs/default/osquerybeat-json.log

You can also change the agent logging level to debug to get more details. In order to do that you click on specific agent on the fleet page, choose the "Logs" tab, at the bottom of the tab "Agent logging level" select "debug", apply changes.
This will update agent.logging.level in the fleet.yml file and set the logging level to "debug".

Hey,

I have a few updates, though nothing really stands out. I uncovered more issues, maybe there is something more fundamental wrong with my install. I will mention all these, because for one of you all the puzzle pieces might yield a whole picture. Apologies if this makes this issue more complex...

  1. First of all, thanks for pointing me to the logging level setting for agents - I did not know this existed. Is there a way to set this globally on an Agent Policy?! Also, when i close the page and re-open it, the logging setting goes back to "info" - is this only a local filter, or does it actually instruct the agent to change its log level for all integrations?
  2. The osquerybeat logs on the machines themselves are empty (minus the regular "non-zero metrics" loglines). cat data/elastic-agent-054e22/logs/default/osquerybeat-json.log* | grep -v "Non-zero" returns nothing.
  3. I'm seeing a weird behavior with the fleet server. When I update a policy, it will show the hosts as "out of date", even though they appear "green" and check in regularly. I had to restart the fleet server in order for the changes to go through. Are the OSQuery queries relayed through fleet server? I'm getting the feeling that the fleet server is the culprit here. The fleet server does not produce any log lines besides starting with
fleet_server_1  | 2021-05-29T14:44:56.697Z      INFO    warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
fleet_server_1  | 2021-05-29T14:44:56.697Z      INFO    application/application.go:68   Detecting execution mode
fleet_server_1  | 2021-05-29T14:44:56.698Z      INFO    application/application.go:93   Agent is managed by Fleet
fleet_server_1  | 2021-05-29T14:44:56.698Z      INFO    capabilities/capabilities.go:59 capabilities file not found in /usr/share/elastic-agent/state/capabilities.yml
fleet_server_1  | 2021-05-29T14:44:56.949Z      INFO    [composable]    composable/controller.go:46     EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
fleet_server_1  | 2021-05-29T14:44:57.052Z      INFO    [composable.providers.docker]   docker/docker.go:43     Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
fleet_server_1  | 2021-05-29T14:44:57.054Z      INFO    store/state_store.go:327        restoring current policy from disk
fleet_server_1  | 2021-05-29T14:44:57.060Z      INFO    stateresolver/stateresolver.go:48       New State ID is hH81eU-q
fleet_server_1  | 2021-05-29T14:44:57.060Z      INFO    stateresolver/stateresolver.go:49       Converging state requires execution of 2 step(s)
fleet_server_1  | 2021-05-29T14:44:57.100Z      INFO    operation/operator.go:259       operation 'operation-install' skipped for fleet-server.7.13.0
fleet_server_1  | 2021-05-29T14:44:57.306Z      INFO    log/reporter.go:40      2021-05-29T14:44:57Z - message: Application: fleet-server--7.13.0[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
fleet_server_1  | 2021-05-29T14:44:57.486Z      INFO    operation/operator.go:259       operation 'operation-install' skipped for filebeat.7.13.0
fleet_server_1  | 2021-05-29T14:44:57.763Z      INFO    log/reporter.go:40      2021-05-29T14:44:57Z - message: Application: filebeat--7.13.0--36643631373035623733363936343635[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
fleet_server_1  | 2021-05-29T14:44:58.018Z      INFO    operation/operator.go:259       operation 'operation-install' skipped for metricbeat.7.13.0
fleet_server_1  | 2021-05-29T14:44:58.335Z      INFO    log/reporter.go:40      2021-05-29T14:44:58Z - message: Application: fleet-server--7.13.0[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to RUNNING: Running on default policy with Fleet Server integration - type: 'STATE' - sub_type: 'RUNNING'
fleet_server_1  | 2021-05-29T14:44:58.820Z      INFO    log/reporter.go:40      2021-05-29T14:44:58Z - message: Application: metricbeat--7.13.0--36643631373035623733363936343635[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
fleet_server_1  | 2021-05-29T14:44:58.822Z      INFO    stateresolver/stateresolver.go:66       Updating internal state
fleet_server_1  | 2021-05-29T14:44:58.822Z      WARN    [tls]   tlscommon/tls_config.go:98      SSL/TLS verifications disabled.
fleet_server_1  | 2021-05-29T14:44:58.827Z      INFO    stateresolver/stateresolver.go:48       New State ID is hH81eU-q
fleet_server_1  | 2021-05-29T14:44:58.827Z      INFO    stateresolver/stateresolver.go:49       Converging state requires execution of 0 step(s)
fleet_server_1  | 2021-05-29T14:44:58.827Z      INFO    stateresolver/stateresolver.go:66       Updating internal state
fleet_server_1  | 2021-05-29T14:44:58.931Z      INFO    log/reporter.go:40      2021-05-29T14:44:58Z - message: Application: filebeat--7.13.0--36643631373035623733363936343635[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
fleet_server_1  | 2021-05-29T14:44:59.156Z      INFO    [api]   api/server.go:62        Starting stats endpoint
fleet_server_1  | 2021-05-29T14:44:59.157Z      INFO    application/managed_mode.go:291 Agent is starting
fleet_server_1  | 2021-05-29T14:44:59.157Z      INFO    [api]   api/server.go:64        Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)
fleet_server_1  | 2021-05-29T14:44:59.257Z      WARN    application/managed_mode.go:304 failed to ack update open /usr/share/elastic-agent/state/data/.update-marker: no such file or directory
fleet_server_1  | 2021-05-29T14:45:00.209Z      INFO    log/reporter.go:40      2021-05-29T14:45:00Z - message: Application: metricbeat--7.13.0--36643631373035623733363936343635[e03065cf-b804-4c2a-85aa-0c034ff3d6ed]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'

edit: Agents also continuously log:
[elastic_agent][info] No events received within 10m0s, restarting watch call

Osqueries are routed though fleet-server.

Can you attach all of the osquerybeat-json.log's for the three hosts?

Thanks!