Some Agent Do Not Send Data Custom Logs (Filestream)

I've created a cronjob that runs a script and outputs ndjson to a file. Then I've created a integration policy that reads the log file using Custom Logs Filestream integration and add it to agent policies.

Some of the agents sent the data, but most of them didn't. I've checked file permissions and paths used in cronjob, all agents uses the same content. There is no error log in agents logs by the way.

Thanks for your help.

It can be because of various reasons. Check the following one by one and let me try to help you.

  1. Is the data exist in the defined path? Please try to add a test file and see if it’s ingested.
  2. Is the data older than 72 hours? check the ignore_older parameter in the integration. If it’s set the data must be new to beat to harvest.
  3. Make sure there is no index rejection by checking GET /_stats?filter_path=**.index_failed
  4. Double check the elasticsearch logs and agent logs too see if there is any explanation about the rejection.

This explanation is the most important one. What is the difference between those agents?

  • Yes, the data is exists and in correct format for all agents.
  • New data generated by script hourly, so its younger than 72 hours. I've checked also.
  • Ignore older haven't setted, however I've setted it to 0 in case.
  • I've checked index rejections, there is no index rejection.

I've checked the agent logs located in /opt/Elastic/Agent/data/elastic-agent-*/logs, some of them have written error logs, some of them haven't.

The crontab:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
0 * * * * bash /usr/local/bin/script.sh | tee /var/log/mylogs/log.ndjson

Note: I've configured the parser in integration policy

- ndjson:
   target: ""

Sample log:

{"field1":"value1","field2":"101","field3":"somestr","field4":"running","field5":12,"field6":32768,"field7":100}
{"field1":"value1","field2":"103","field3":"somestr","field4":"running","field5":4,"field6":16384,"field7":200}
{"field1":"value1","field2":"119","field3":"somestr","field4":"stopped","field5":4,"field6":8192,"field7":50}
{"field1":"value1","field2":"120","field3":"somestr","field4":"stopped","field5":4,"field6":8192,"field7":50}
{"field1":"value1","field2":"121","field3":"somestr","field4":"stopped","field5":4,"field6":8192,"field7":50}
{"field1":"value1","field2":"122","field3":"somestr","field4":"running","field5":12,"field6":32768,"field7":250}
{"field1":"value1","field2":"125","field3":"somestr","field4":"running","field5":8,"field6":32768,"field7":100}
{"field1":"value1","field2":"129","field3":"somestr","field4":"running","field5":4,"field6":8192,"field7":100}

sample error logs:

jq 'select(."log.level" == "error" and .message != " ") | .message' elastic-agent*|sort -u
"2025-09-22 07:56:29: debug: Exec.cpp:189 ChildMonitor is pid 3734993 and monitoring pids 3734922 and 3734971"
"2025-09-22 07:56:29: debug: ProcFile.cpp:855 Found 1 cgroups for pid(3734922)"
"2025-09-22 07:56:29: debug: ProcFile.cpp:861 cgroup: id=0 type= path=/system.slice/elastic-agent.service"
"2025-09-22 07:56:29: info: InstallLib.cpp:610 Running [/opt/Elastic/Endpoint/elastic-endpoint] [version --log stdout]"
"2025-09-22 07:56:29: info: InstallLib.cpp:650 Installed endpoint is expected version (version: 8.17.3, compiled: Wed Feb 26 21:00:00 2025, branch: HEAD, commit: e54b5de09796d1b3601f7d5472359c11fafafc67)"
"2025-09-22 07:56:29: info: MainPosix.cpp:389 Verifying existing installation"
"Error dialing EOF"
"Error dialing read tcp xx.xx.xx.xx:38084->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:38090->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:49014->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:50976->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:50998->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:51204->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:56974->xx.xx.xx.xx:9200: read: connection reset by peer"
"Error dialing read tcp xx.xx.xx.xx:58798->xx.xx.xx.xx:9200: read: connection reset by peer"
"Exiting: context canceled"
"failed accept conn info connection: accept unix /opt/Elastic/Agent/.eaci.sock: use of closed network connection"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": EOF"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": EOF"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:49014->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": EOF"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:38090->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:58798->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": EOF"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:50976->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:50998->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:51204->xx.xx.xx.xx:9200: read: connection reset by peer"
"Failed to connect to backoff(elasticsearch(https://xx.xx.xx.xx:9200)): Get \"https://xx.xx.xx.xx:9200\": read tcp xx.xx.xx.xx:56974->xx.xx.xx.xx:9200: read: connection reset by peer"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": read tcp xx.xx.xx.xx:38084->xx.xx.xx.xx:9200: read: connection reset by peer"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to perform any bulk index operations: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": read tcp xx.xx.xx.xx:38084->xx.xx.xx.xx:9200: read: connection reset by peer"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": EOF"
"failed to publish events: Post \"https://xx.xx.xx.xx:9200/_bulk?filter_path=errors%2Citems.%2A.error%2Citems.%2A.status\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
"runOsquery exited with error: context canceled"

I've sent request to es node from agent using curl by the way, the connection error may be temporary

Hi @lomar

What version agent and elastic stack and integrations?
How did you install?
Do you use self generated certs?

That does not necessarily mean the agent is connecting correctly

Have you run

./elastic-agent status

Have you run and looked at the output in detail
./elastic-agent inspect

It is hard to tell from your sampled error messages, but it sure looks like a connectivity issue.

1 Like

Yes. But other integrations assigned to policy of the agent send logs, so I don't think It's caused by SSL.

β”Œβ”€ fleet
β”‚  └─ status: (HEALTHY) Connected
└─ elastic-agent
   └─ status: (HEALTHY) Running

Yes I've checked these section, It's similar to other agent's that works well sharing same policy. I'll share with you next post due to charachter limit.

Elastic agent version: 8.17.3
Elasticsearch node versions: 9.1.3
Custom Logs File Stream Integration version: 1.1.0

agent:
  download:
    sourceURI: https://artifacts.elastic.co/downloads/
  logging:
    level: info
  monitoring:
    enabled: true
    logs: true
    metrics: true
    namespace: default
  protection:
    enabled: false
fleet:
  enabled: true
  hosts:
    - <REDACTED_FLEET_SERVER>
  ssl:
    verification_mode: none
  timeout: 10m0s
host:
  os: linux
  osinfo:
    family: debian
    version: 13 (trixie)
inputs:
  - name: Auditd Logs
    type: logfile
    streams:
      - dataset: auditd.log
        paths:
          - /var/log/audit/audit.log*
  - name: System Audit
    type: audit/system
    streams:
      - dataset: system_audit.package
        period: 15m
    type: logfile
    streams:
      - dataset: system.auth
        paths:
          - /var/log/auth.log*
          - /var/log/secure*
      - dataset: system.syslog
        paths:
          - /var/log/messages*
          - /var/log/syslog*
  - name: Windows Event Logs
    type: winlog
    streams:
      - dataset: system.application
      - dataset: system.security
      - dataset: system.system
  - name: System Metrics
    type: system/metrics
    streams:
      - dataset: system.cpu
      - dataset: system.memory
      - dataset: system.network
      - dataset: system.process
      - dataset: system.uptime
  - name: Journald Logs
    type: journald
    streams:
      - dataset: system.auth
      - dataset: system.syslog
  - name: Endpoint Security
    type: endpoint
    meta:
      package: endpoint
      version: 9.1.0
    policy:
      linux:
        malware:
          mode: detect
      windows:
        malware:
          mode: detect
  - name: Osquery Manager
    type: osquery
  - name: Server VM Inventory
    type: filestream
    streams:
      - dataset: vm.inventory
        paths:
          - /var/log/mylog/mylog.ndjson
outputs:
  default:
    type: elasticsearch
    hosts:
      - <REDACTED_ES_HOST_1>
      - <REDACTED_ES_HOST_2>
      - <REDACTED_ES_HOST_3>
      - <REDACTED_ES_HOST_4>
    ssl:
      ca_trusted_fingerprint: <REDACTED>
revision: 13
runtime:
  arch: amd64

On the agent host, there's a directory called events under the logs. Did you check in there because that's where it'll show issues with actually sending the data.

Please look in there, and if you share the logs, please don't use jq or similar tools to parse them, please take a look for errors and share those events in raw form.

Example

/opt/Elastic/Agent/data/elastic-agent-9.2.0-SNAPSHOT-c4b645/logs/events

Can you stop and start the agent and then share a full latest one of the regular logs (not in event directory)

Quick glance of event log just shared, there is no reference to that log file at all that you're trying to harvest.

So that leads me to indicate that failing to access that path or something like that.

Assuming you checked basic things like permissions reading the file....

How many lines are you added to the file ... just 1? 100s?

If you only add 1 line and do not add a newline it will not read it.

There is something basic going on custom logs is used widely.

Try another file... try another path

- /var/log/mylog/*

There is the logs after restarting the agent:

I've already tried this.

{"host_type":"Proxmox","vm_id":"101","vm_name":"vm-1","vm_state":"running","cpu_count":12,"memory_mb":32768,"disk_gb":100}
{"host_type":"Proxmox","vm_id":"103","vm_name":"vm-2","vm_state":"running","cpu_count":4,"memory_mb":16384,"disk_gb":200}
{"host_type":"Proxmox","vm_id":"119","vm_name":"vm-3","vm_state":"stopped","cpu_count":4,"memory_mb":8192,"disk_gb":50}
{"host_type":"Proxmox","vm_id":"120","vm_name":"vm-4","vm_state":"stopped","cpu_count":4,"memory_mb":8192,"disk_gb":50}
{"host_type":"Proxmox","vm_id":"121","vm_name":"vm-5","vm_state":"stopped","cpu_count":4,"memory_mb":8192,"disk_gb":50}
{"host_type":"Proxmox","vm_id":"122","vm_name":"vm-6","vm_state":"running","cpu_count":12,"memory_mb":32768,"disk_gb":250}
{"host_type":"Proxmox","vm_id":"125","vm_name":"vm-7","vm_state":"running","cpu_count":8,"memory_mb":32768,"disk_gb":100}
{"host_type":"Proxmox","vm_id":"129","vm_name":"vm-8","vm_state":"running","cpu_count":4,"memory_mb":8192,"disk_gb":100}

All log files across the agents are same format and similar outputs. I'm already using the same file path for the cronjob, script, and log path in all of them. I checked all of them and there are no differences.

I've mentioned about this, all permissions are correct. If it weren't true, it wouldn't work with other agents.

By the way, the agent run with root permissions, so file permissions might not be problem as well

Looked through .. .do not see any messages about that filestream collector at all... not sure what to tell you. I would turn up the Log Level to DEBUG which it is not.. do you know where to do that? Its kinda hidden

Do that then look through all the logs again...

With debug you should see it start the collector for that file...

I would turn off everything else if you can to isolate but that is up to you....

Add some different paths, add some different files use wildcards

Remove the integration put it back

I notice that you set the data stream to something specific ... vm.inventory assuming you actually looked in the correct data stream, I would remove... add back with all defaults etc.

Not sure what to tell you... Custom Logs is used widely something simple / basic as this point...

There sure are a lot of connection errors.... I don't usually see that.
Lots of errors trying to write
Lots of errors trying to connect to fleet

Hi, I couldn't find the debug logs, but I created a new integration policy and tried reading the same log file. I encountered the same issue, but when I ran the following command, the log appeared.


cat vm_inventory.ndjson | tee -a /vm_inventory.ndjson

Of course, doing this created duplicate data. But at least the data is coming through. Before that, I tried running the echo "" | tee /vm_inventory.ndjson command to write logs to the file, but that didn't work.

echo "" | tee /vm_inventory.ndjson that writes nothing to a file in the root directory not sure why you would expect that to add lines to the log file.

So I am confused ... is it reading the logs? Are you actually

Your intergations is set to

    streams:
      - dataset: vm.inventory
        paths:
          - /var/log/mylog/mylog.ndjson

and your commands are writing to
/vm_inventory.ndjson

I am confused... why not just point it to an actual log file... or just concatenate into /var/log/mylog/mylog.ndjson