Fleet Pipeline does not exist - Data Streams not getting data

I have elastic setup and working using regular beats fine. I have since setup fleet and trying to add servers using that. however I am seeing a bunch of errors about pipelines not existing, only for certain items. See an example below.

[elastic_agent.metricbeat][warn] Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc04f1568a69c8d10, ext:3720939793352, loc:(*time.Location)(0x564bda2ce700)}, Meta:{"raw_index":"metrics-system.process.summary-default"}, Fields:{"agent":{"ephemeral_id":"aba05f5e-49d8-4c29-ab20-736ecfe5b833","hostname":"pacc-intranet.pac.internal","id":"556d8a1b-10b4-49f4-96b8-4951d95d460c","name":"pacc-intranet.pac.internal","type":"metricbeat","version":"7.15.0"},"data_stream":{"dataset":"system.process.summary","namespace":"default","type":"metrics"},"ecs":{"version":"1.11.0"},"elastic_agent":{"id":"556d8a1b-10b4-49f4-96b8-4951d95d460c","snapshot":false,"version":"7.15.0"},"event":{"dataset":"system.process.summary","duration":25277107,"module":"system"},"host":{"architecture":"x86_64","containerized":false,"hostname":"pacc-intranet.pac.internal","id":"822f79372d6b43fc9557929197ffcb48","ip":["192.168.200.30","fe80::e7fe:aca2:ad20:dd7d"],"mac":["00:50:56:ad:00:46"],"name":"pacc-intranet.pac.internal","os":{"family":"","kernel":"4.18.0-305.7.1.el8_4.x86_64","name":"Rocky Linux","platform":"rocky","type":"linux","version":"8.4 (Green Obsidian)"}},"metricset":{"name":"process_summary","period":10000},"service":{"type":"system"},"system":{"process":{"summary":{"dead":0,"idle":79,"running":1,"sleeping":139,"stopped":0,"total":219,"unknown":0,"zombie":0}}}}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"pipeline with id [metrics-system.process.summary-1.1.2] does not exist"}, dropping event!

I am also not getting any data actually put into my data streams I assume because the pipelines do not exist. Again only on some though. I did upgrade to 7.15 from 7.14 but I can't remember what date that was exactly.

Is fleet supposed to make these pipelines automatically, I would assume so because some are working and I've never made one?

1 Like

Hi @Heroj04 we're experiencing the same issue with a large quantity of logging (Just purged a 58GB metricbeat-json log file). We have recently upgraded to 7.15 from 7.14 as well. Happy to provide logs to assist in resolving this one.

No luck yet.
I have reinstalled elastic-agent on the clients. I've reinstalled the agent on the server. defined new policies. removed and readded certain integrations.
All to no avail so far.
I'm thinking about looking for a way to clean install all the fleet configuration in elastic again. cant seem to think of anything.

Sounds like you've dome a similar thing to me, I've also rolled over the indices to see if that would help, but looks like it hasn't. @ruflin does anyone on the elastic team have any suggestions as to what we may be doing wrong here? Oh and a clarification to my previous post, the 58GB log file is the local metricbeat log file on the host that the agent is installed on.

I have had this a few times when upgrading fleet versions, as i always upgrade with every minor version.

Something happens in the upgrade / migration of pipelines that an Integration is upgraded and the pipeline for the latest version is not there. Happens more when you just upgrade just an Integration and not the Fleet stack whilst upgrading the ELK stack.

The only way i have found around this was to totally remove every instance of the Fleet Server Policy and it's agent and start with a fresh Fleet Server & Agents install, totally removing all data, files and folders.

There is certainly a problem that comes with upgrading existing fleet setups.

1 Like

I usually remove all traces of the agents via this command.. Just unenrolling and uninstalling the agent does not always remove all the files, folder and data.

sudo find / -type d,f -name "*elastic-agent*" -exec sudo rm -vr {} +```

Thanks @zx8086 not sure if I should wait for (hopefully) someone from the elastic team to pop up so that they can grab the diagnostics that they need or if I should go ahead an reinstate the whole config. We have Windows agents as well as linux, but I'd assume that it would be the same string I'm looking for anyway.

I've just been doing some more testing, removed fleet server/elastic-agent and created a new server policy to reinstall, still no luck.
Anybody how to completely remove the fleet configuration from elastic/kibana and just start fresh?

unenroll the fleet agents
uninstall elastic agents from server and clients
use this to remove the elastic agent data and files completely
install from scratch

sudo find / -type d,f -name "*elastic-agent*" -exec sudo rm -vr {} +

I did give this a go but still was stuck with the same issue, it just would not create those pipelines again.
Since this is a new server I'm testing with, I ended up just completely blowing away ELK and it setting it up again. now it seems to be working correctly.
Will have to wait to see if it breaks again next update.

@Heroj04

Sounds like it was data / data corrupt related.

Thanks @zx8086
That did resolve the issue, however I'm a little disappointed that I have to blow away the agents on ~30 fleet servers each time there is an integration update. Hopefully the elastic team can work to resolve this glitch or provide some more pointed advice on how to resolve this issue. Not sure why the updated pipelines weren't automatically created.

Thanks Again

@hamiland, No problem, I had to do the same and hence why I logged the call with elastic team. It seems to be better with 7.15.1, so fingers crossed.

I think if this happens again, just the Elastic-Agent on the Fleet server might resolve the problem. My test setup if 5 agents and it is automated, so usually this isn't a big pain. I would advise to automate that, so it isn't a pain however large your estate becomes.

1 Like

Hi all, thanks for being some of our first adopters of Fleet and Elastic Agent. It seems you all have encountered one of our rough spots in integration upgrades. This is definitely not the experience we want and is something we're planning to address in an upcoming release as part of [Fleet] Handle common transient errors during package installs with a retry · Issue #111859 · elastic/kibana · GitHub.

In the meantime, there is an API workaround you can use to force these assets to be reinstalled without having to wipe everything and start over. This command will force the base package to be reinstalled:

curl -XPOST \
  --url http://<your kibana host>/api/fleet/epm/packages/system-1.1.2 \
  -u <username>:<password> \
  -H 'content-type: application/json' \
  -H 'kbn-xsrf: x' \
  --data '{"force": true}'

You'll want to be sure you use the correct version number at the end of the URL (I used 1.1.2 here). You can see which version is currently installed by visiting http://<your kibana host>/app/integrations/detail/system/settings. This can also be used for other packages as well, just change the system in the URL to another package name and use the correct version number.

I hope this helps!