Datafeed [datafeed-packetbeat_dns_tunneling] cannot retrieve data because no index matches datafeed's indices [packetbeat-*]

Hi folks
I have been searching high and low regarding this error and I am very new to the ELK stack.

ELK 8.7.1, using Elastic Agents on clients.

When I try to enable the ML job packetbeat_dns_tunneling it fails with the above error messages.
In general every ML job that involves the packetbeat -* indices fails to be created/run.

I did add the packetbeat integration to a policy, deployed to + 50 agents and the index is there, but I am clueless now as what to do about the above error.

Can someone point me in a direction for further investigation here

Cheers

Hi guys

do I have to set the 'index: packetbeat' in the logstash configuration here ?

..... is there a 'fast lane' here for questions, we have an Enterprise License, but I would rather have everybody benefiting from this discussion/issue here.

Cheers

anyone ?

From what I can read from the documentation, an elastic agent will automatically create all the beats indexes automatically.

packetbeat-* is not created and hence my ML jobs fails (those involving packetbeat-*)

so what todo here ?

my 2 logstash servers in front of elasticsearch have the input filter elastic_agent specified and the elasticsearch output filter have the data_stream => true (+ ssl, apikey etc.)

can someone help me troubleshoot this, where to begin :man_shrugging:.
:face_with_monocle:

You really should open a support ticket.

This forum is certainly not a fast lane. It might be sometimes but other times not... There are no SLAs here :slight_smile:

Can you show the index? Just because it index is there does not mean the data is in it.

Second, I think you're leaving out some key pieces. It sounds like you're trying to use agent through logstash. Is that correct?

If so, you need to share your agent configuration and logstash configuration?

Find and show where you believe the packetbeat data is ending up.

Worse Comes to worse. You Can just clone that ML job or GET it and edit where it's getting the data from..

It Looks what you're trying to do should work according to the docs So show us where the data is ending up? You might just need that edit the ML job. Maybe it hasn't been updated for agent yet

Events indexed into Elasticsearch with the Logstash configuration shown here will be similar to events directly indexed by Elastic Agent into Elasticsearch.

Hi Stephen
thank you for your reply here, appreciate it.

No sorry Stephen, I meant the logs-elastic_agent.packetbeat-default is there, but the packetbeat-* is not.

That is correct, I did not explicit state this, I am new to all this and figured the L in ELK was enough, sorry this was not clear.
I have 2 logstash servers in front of Elasticsearch. Elastic Agents deployed with various integrations, including the packetbeat (renamed to Network Capture I believe).
Nothing have been done on the agent configuration, it runs the default agent config. The same applies to the logstash.yml, I only changed it to support queues (memory->persistent), nothing else have been changed. Perhaps I need to run a plugin, the docs does indicate I don't have to do anything special here, but :man_shrugging:

The output filter in Fleet UI have been configured to Logstash and I have added support for loadbalancer and 4 workers. This is reflected on the agent side running an inspect.
Literally everything else is runnig as expected(ML jobs, security rules), except for the packetbeat index not being created.

how can I verify this ? api, some version setting ? I made sure 8.7.1 across the stack.

I stopped believing :slight_smile: I can not show you this as I have no idea where data is ending up, should they actually arrive, but suggestions are welcome on how to diagnose this further.

That is not possible either as this results in an error messages telling me packetbeat-* is missing :joy:

I am looking at my logstash filter, this is configured to handle data_streams, I was wondering if I have to create one more elasticsearch output specifying the :

index => "%{[@metadata][beat]}-%{[@metadata][version]}
?

support ticket sure, last resort.

My main beef with this is, 'most' topics in this forum seems to end with a "DM" or "raise a support ticket". All this is great for the individual having a problem, but what about people with similar problems, the are left behind. All I am saying is, it should be accessible to everybody.

Anyway I not here to change anything, be gentle :slight_smile: I am new to all this.

Stephen I have an update,

I followed this guide here.

This created the index :heavy_check_mark:, but now I see no data is coming in.

This can only mean either Logstash or the agents are configured wrong ? (right :man_facepalming:)
so far my bet is on the logstash filter. The jury is still out on the extra output filter suggested in my previous reply.

Do you really want me to post the massive output from elastic-agent inspect :grin:, the logstash.yml is the default configuration with the exception of the persistant modification.

There is some confusion in this.

First, the Elastic Agent will store data on data sreams where the name of the data stream starts with logs-*.

Since you are using the Network Packet Capture integration, it will save the logs in a data stream with network_traffic in its name, according to the documentation.

Do you have a similar data stream in your Cluster?

Independent of the name of the data stream, using the Elastic Agent you will not have any data stream named packetbeat-*, those would be created only using the stand-alone packetbeat instalation.

Second, when using the Elastic Agent Elastic expects that you will send the data directly to Elasticsearch, when you add Logstash between them things can get a little more complicated, it works, but you need to configure it like indicated in the documentation.

For example, you should use the elastic_agent input and your output will basically only have the hosts and data_stream set to true.

Also, I'm not 100% sure, but if I'm not wrong the index option is ignored when you use data_streams, it doesn't matter what you put there, what matters are the data_stream_* settings, which you also should not have when using the elastic agent as this will come from the agent itself.

You should share your logstash configuration, or at least the inputs and outputs you are using.

And Third, from what I saw here I don't think that this issue is at your side, but at Elastic side.

The ML job packetbeat_dns_tunneling is a built-in job, and this job is configured to look at the packetbeat-* indices/data streams, but those indices/data streams are not created by the Elastic Agent, only by the stand-alone packetbeat, so the job needs to be updated by Elastic or a new one needs to be created.

You should open a support ticket to check on this.

1 Like

Hi Leandro, thanks for contributing here.

yes absolutely and it begins with me thinking the "old" packetbeat was integrated with the Network Packet Capture integration. I believe I've read it was orginally called packetbeat then renamed to Network Packet Capture.

:heavy_check_mark:

I kinda get that now :joy:

This works fine for me!.

I could not find documentation stating Agents are expected to talk directly to Elasticsearch.
From a security pov this is naive/bad, having a jwt token and a direct endpoint in your agent config telling you where to abuse Elasticsearch is and should not be considered best practices.

Leandro if you have any other solutions here that could shut up the security guys, please chip in :pray:.

:heavy_check_mark: this has been working from day 1.

idk :man_shrugging:, which is why I suggested a 2nd output filter in logstash using this behaviour.
At the moment it doesnt matter as we have established Network Packet Caputure does not create the packetbeat index.

not sure I follow here Leandro, when you're using logstash I have to specify at least the elastic_agent and the data_stream>=true, can't say much about other data_stream_* settings, I did not explict configure this in my logstash ... or I am missing your point.
see here:

input {, 
  elastic_agent {
    port => 5044
    ssl => true
    ssl_certificate_authorities => [".."]
    ssl_certificate => ".."
    ssl_key => ".."
    ssl_verify_mode => "force_peer"
  }
}

output {
  elasticsearch {
    hosts => "ip:9200"
    api_key => "..."
    data_stream => true
    ssl => true
    cacert => "path to crt"
  }
}

Initially I figured I could just add another elasticsearch output with the index specified, again this is irrelevant as the Network Package Capture" integration does not create the packetbeat index.

got it, more stupid questions here, can stand-alone packetbeat and an agent with the Network Package Capture run side by side, there seems to be so much overlap here ?.

I really appreciate the input I get here from you guys, thanks.

The Elastic Agent runs beats behind the scenes, at first it was only filebeat, then other beats where added, like winlogbeat, metricbeat, auditbeat and packetbeat, but this is transparent for the user as you would just add the integrations.

You won't find, this is not documented, it is just how the Elastic Agent started and works now.

The Elastic Agent relies on ingest pipelines to parse the data, those ingest pipelines are created and executed on Elasticsearch ingest nodes, since the main functionality of Logstash is to parse the data, it becomes redundant to have it as you can do almost everything with Ingest pipelines.

If I'm not wrong the first versions of Elastic Agent didn't even supported Logstash as an output.

If you are not doing any parse or enrich on Logstash, just have an input and outpu, I see no reason to use it, it will be just another tool to maintain.

Not sure what you mean with that and how this an issue and how this would be different from having the logstash output.

The elasticsearch output in logstash has some data stream settings like data_stream_namespace etc, but you do not need to configure it as it is already present on the documents that come from the elastic agent.

The data stream in which logstash will write is derived from the data_stream fields present in every event, this happens because data_stream_auto_routing is set to true by default.

Not sure, but I see no reason to do that, you should just run one or another, not both.

exactly and I just mistakenly thought PacketBeat would behave like the stand alone package, further more the stand alone packages seems to be decommissioned eventually and the general advices is to embrace the Agent approach.

Right, there will be use cases for us where the need to parse very old custom logfiles must be handled, logstash seems to fit here, but again novice when it comes to elasticsearch.

If I know where 9200 is located in the network, I can start enumerating data, dump indexes, steal corporate secrets even manipulate data. Maybe delete some of the security indexes before doing a dcsync :smile:, I mean this will not trigger an alert for the blue team now.
Slowing dowing attackers with authentication is of couse always nice (if not disabled by default :crazy_face:), no matter the authentication type. It's always best practices putting an intermediate in front on the stack.
Having Logstash in front here seems like a good choice, one thing I like with Logstash in front of Elasticsearch, is the Agent client side support of loadbalancers, this always scales better than server side + gotta love them persistent queues. There are other considerations here too, like different subnets, but I am not a certified lumberjack on our current setup :stuck_out_tongue_winking_eye:.

:+1: noted, thank you for your valuable input.

Everything you mentioned requires the client to be authenticate and have the right permissions, with security enabled every elasticsearch requests needs to be authenticated.

I'm not sure what you mean with the following:

From a security pov this is naive/bad, having a jwt token and a direct endpoint in your agent config telling you where to abuse Elasticsearch is and should not be considered best practices.

If I'm not wrong when you use fleet, the fleet server will generate an API key for the agents to use and will send this along with the configuration to the agent, and this will be encrypted in the agent server, this is explained here in the documentation.

So, using fleet, the endpoint and the API key are not present as plain-text on any files on the agent server.

The endpoint and the API key would be present in plain-text in the configuration file only if you were using the Elastic Agent in standalone mode, which is an Advanced use case.

In this scenario you need to make sure to limit access to the Elastic Agent configuration file, and this is no different of having the API Key in Endpoint in your Logstash configuration as you would also need to limit the access to the configuration files/Logstash server.

Elastic strongly recommends using the Elastic Agent with a Fleet Server.

Hi Leandro

Yes, so ? you think that would stop a hacker :innocent:, as I said authentication is just a way of slowing down a hacker, this has never been the main concern. Do kerberos authentication, rest as sure I will get a TGT else where in the corporate network.

right, but I'm not worried about fleet, take for instance the Endpoint Defend integration.
Look at the output section :wink: , there you find Elasticsearch with the official endpoint and an ApiKey. I don't know what this ApiKey grants me access to, but I am sure I can enumerate templates, nodes and other useful stuff + I know an admin will f#¤k up the permissions eventually.
That is my main concern, here is an endpoint and an apikey to go.

sure ? I believe I never questioned the Fleet Server, this is simply an amazing way of pushing tedious configuration to clients.

Leandro, you provided me with the right answer, Elastic Agent does not create the packetbeat-* index (yet) and some ML jobs (still) depends on the stand alone packetbeat, hence this needs to be install also.

Yes look like the packetbeat ML jobs have not been ported yet to the agent integration yet.

However, there is no great magic on the ML jobs, they just take a datafeed, some fields, and a config. We can most likely just load that ML job through the packetbeat and then clone / edit that job and change the data feed.. perhaps a few field names

Perhaps, If I get a chance I will take a look, it will probably be later in the week, I used to build the DNS Tunnelling / Exfil ML job by hand all the time...

It probably comes down to doing

Run packetbeat setup -e

GET on the ML job

Editing a Couple items in the JSON, data feed perhaps a few field names,

the PUT ing it back, and then it will work against the new data stream.

If you are interested let me know.

Agreed, Pick one

  • Use packetbeat and you get the ML job for Free.
  • Use agent and we (or support :wink: ) can probably get the ML job ported.

Let me know if you want me to look at the ML job...

PS. not sure where you got that, that is factually inaccurate / an over-exaggeration at the very least ... seems like an odd way to ask for assistance, especially from a user that is self-admittedly relatively new to the community.

For a little perspective:

  • I referred you to support, as most people that pay for support with SLAs want to use it as we are all volunteers here, no SLA no guarantee that any / every topic will even be addressed.
  • The vast majority of users that come to this forum use Basic / Free and thus do not have support, AND we do not cover licensing costs, or specific account issues etc. here
  • We do, however, appreciate you want to share the issues and results with the community

Let see if we can get this to work for you and the community

Well OK, I just looked into it and looks like the ML jobs are actually there ... you just need to create them with the correct data view... took me 5 mins. There is one hitch there is a slight but important miss-configuration that will need to be corrected i.e. the correct event.dataset ... I will show you how.

I understand there is already a PR to fix this, I don't have it handy

EDIT 8.8.0 should already be fixed,
8.7.1 still has this error

1st I am doing this with Elastic Agent Network Capture -> Elasticsearch
(No logstash in the middle although that should work according to the documentation)

Assumes agent is sending data

Go To ML - Jobs - Create Job

Select the Correct Data View logs-network_traffic

When you do that it will recognize it and then Select The Correct Job Group a little confusing because it says packetbeat (that should get cleaned up)

Select it

And you will get this screen and select Create Jobs

You need to go in and make one edit...

Edit the Data Feed (the even.dataset is wrong

{
  "bool": {
    "filter": [
      {
        "term": {
          "event.dataset": "network_traffic.dns" <!---- THIS 
        }
      },

Save and then Test the Data Feed ...

Should look something like this..

And Whalluh you have the correct jobs pulling from the correct data view, You can just start it

You can start it when you are ready.... Probably need to do the same with the others the event.dataset


will be incorrect

Stephen, fantastic job, this works flawless.

Thank you for your support here.

:dart::pray::heavy_check_mark:
look how far we got around various topics, thanks to Leandro and you :+1:

......
If you need anything from me, like license info etc. let me know, just so you can verify I'm not freeloading. I will next time absolutely reconsider asking topics here and go strait to support instead, road seems less bumpy as both of you adviced me to do, again thank you for your time and support.

@A113n your welcome!

Glad you got it working. Please come here and ask questions anytime you like... There is no freeloading. This is a community.

I think we just got off on the wrong foot because often when a user has support that may be a quicker more direct method or may include more sensitive data etc.

Also does depend on the level of support. Sometimes a user may only have break / fix others may have more consultative depending on the volume of your license.

Come on back. Bring us a good question!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.