Parsing problem when streaming a log file

Hey everyone,

I followed the Stream any log file guide, and have set up a local agent that listens to my log file. But every time I add a new log (manually to test) the parsing just isn't there when it gets indexed in Kibana. The whole line is just falls under the message field like so:

But when I use the API and make a POST call to the same data stream, it parses correctly:

</>
POST logs-generic-default/_doc
{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}
</>

I already created the project, last_activity and log_time fields.

It is the same for the filebeat (where is parses correctly for each field) but when I try to copy a log line and paste it to my own file that Kibana is also listening to, the entire line just falls under the message field.

Thanks

Hi @Kyps Welcome to the community.

1st we discourage screen shots of text, they are hard to read, can not be copied, searched debugged etc..

Also you should look at Kibana -> Discover to look at your logs

What version are you on?

Did you look at the next step...

When you read a log file the entire content of the log line ends up in the message field...

If you want to parse it you will need and ingest pipeline...

If your message field is JSON you can use the JSON processor in an ingest pipeline

Hey Stephen,

Thank you for the answer.

Regarding the pipeline, I have created I pipeline (i think), I did it with the Console in the Dev Tools like this:

PUT _ingest/pipeline/logs-generic-default
{
  "description": "Extracts the log time, project name and project last activity",
  "processors": [
    {
      "dissect": {
        "field": "message",
        "pattern": """{"log_time":"%{log_time}","project":"%{project}","last_activity":"%{last_activity}"}"""
      }
    }
  ]
}
POST _ingest/pipeline/logs-generic-default/_simulate
{
  "docs": [
    {
      "_source": {
        "message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}"""
      }
    }
  ]
}

PUT _index_template/logs-generic-default-template
{
  "index_patterns": [ "logs-generic-*" ],
  "data_stream": { },
  "priority": 500,
  "template": {
    "settings": {
      "index.default_pipeline":"logs-generic-default"
    }
  },
  "composed_of": [
    "logs-mappings",
    "logs-settings",
    "logs@custom",
    "ecs@dynamic_templates"
  ],
  "ignore_missing_component_templates": ["logs@custom"]
}

The _simulate POST call works as expected,, but my logs don't seem to go through the pipeline before getting indexed, maybe it has something to do with my elastic-agent.yml? which looks like this:

outputs:
  default:
    type: elasticsearch
    hosts: <my-host>
    #api_key: 'your-api-key'
    username: <my-user>
    password: <my-pass>
    pipeline: logs-generic-default # is this right to apply the pipeline?
inputs:
  - id: logs-generic-default
    type: filestream
    streams:
      - id: logs-generic-default
        data_stream.dataset: logs-generic-default
        paths:
          - C:\Program Files\Elastic\Agent\data\elastic-agent-03ef9d\logs\myapp.log

But I'll try to use the JSON processor, but Im not sure how to apply the ingest pipeline, like where would I add this code (sorry for the image, this is from the JSON processor guide you linked):

Thanks again, Stephen!

Perhaps this will help..

There are a lot of parameters on the json so look carefully

there are pros and cons to putting the fields at root so you might want to put them under a different Field


PUT _ingest/pipeline/logs-generic-default
{
  "description": "Extracts the log time, project name and project last activity",
  "processors": [
    {
      "json": {
        "field": "message",
        "add_to_root": true
      }
    }
  ]
}


POST _ingest/pipeline/logs-generic-default/_simulate
{
  "docs": [
    {
      "_source": {
        "message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}"""
      }
    }
  ]
}

# result

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "project": "Public_Documentation_Mappings",
          "last_activity": "2023-11-25T09:50:33.026Z",
          "message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}""",
          "log_time": "2023-11-28T09:50:33.026Z"
        },
        "_ingest": {
          "timestamp": "2023-12-01T16:14:14.270428604Z"
        }
      }
    }
  ]
}

Hey Stephen,

Yeah, I got stuck on this step, I get the same results as you with my other pipeline using the disect processor. So back to my initial problem, why is it that that my logs don't get parsed (this is from my myapp.log which I have in my path in the elastic-agent.yml):

There is no project, last_activity and log_time field.
Only:
"message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}"""

It's like it never goes through the pipeline before getting indexed
I can show you View details from the stream section for each log that it reads from myapp.log

Thanks for your time Stephen.

See here

Elastic data stream naming scheme

The Elastic data stream naming scheme is made for time series data and consists of splitting datasets into different data streams using the following naming convention.

  • type: Generic type describing the data
  • dataset: Describes the data ingested and its structure
  • namespace: User-configurable arbitrary grouping

These three parts are combined by a “-” and result in data streams like logs-nginx.access-production. In all three parts, the “-” character is not allowed. This means all data streams are named in the following way:

so
data_stream.dataset: logs-generic-default

is not allowed... you are making assumptions

My suggestion is to following the instructions exactly get it working and then start changing names etc.. putting those - s in definitely part of the problem

And with that it looks like the first page and the second are NOT aligned UGH!!

So looking at the next page you should set this in your agent

data_stream.dataset: example

index_pattern – Needs to match your log data stream. Naming conventions for data streams are <type>-<dataset>-<namespace> . In this example, your logs data stream is named logs-example-* . Data that matches this pattern will go through your pipeline.

Which then will be aligned with the template and everything on this page

That now aligns with

PUT _index_template/logs-example-default-template
{
  "index_patterns": [ "logs-example-*" ],
  "data_stream": { },
  "priority": 500,
  "template": {
    "settings": {
      "index.default_pipeline":"logs-example-default"
    }
  },
  "composed_of": [
    "logs-mappings",
    "logs-settings",
    "logs@custom",
    "ecs@dynamic_templates"
  ],
  "ignore_missing_component_templates": ["logs@custom"]
}

Hey Stephen,

Yeah, Ill change it back to generic, I changed it because I was just experimenting with everything to apply the pipeline... forgot to change it back.
But I can say with confidence that even with the default elastic-agent.yml (following the Stream any log file) it still didn't parse correctly. The only thing I added was the credentials and path.

As of right now I changed my .yml file to look like this:

outputs:
  default:
    type: elasticsearch
    hosts: '<host>:<port>'
    #api_key: 'your-api-key'
    username: <user>
    password: <pass>
inputs:
  - id: your-log-id
    type: filestream
    streams:
      - id: your-log-stream-id
        data_stream.dataset: generic
        paths:
          - C:\Program Files\Elastic\Agent\data\elastic-agent-03ef9d\logs\myapp.log
        

Regarding this:
index_pattern – Needs to match your log data stream. Naming conventions for data streams are <type>-<dataset>-<namespace> . In this example, your logs data stream is named logs-example-* . Data that matches this pattern will go through your pipeline.

Is it the id in the inputs -> streams - id ?

I really appreciate the time Stephen.

No set it to example if you want it to work with the 2nd page... I just found that

data_stream.dataset: example

All the code on the Parsing Page expects the dataset to be example

I reported this to our docs people... that is not good..

So follow the first and second page but use in the agent.yml

data_stream.dataset: example

then try the json parser I gave you in the ingest pipeline.

Hey Stephen,

I changed data_stream.dataset to example (the listener restarted) and then I added a new log to myapp.log and although it still doesn't parse correctly, it looks like the dataset is still generic?

What do you think?

Did you restart the agent?

Does not look like it

I run this in Powershell

Stop-Service "Elastic Agent"
Start-Service "Elastic Agent"

In the root of the Agent, this folder:

And you saved the file

C:\Program Files\Elastic\Agent\elastic-agent.yml

Yes, sir!

Edit: is there a cache?

not sure what to tell you....

Uninstall and reinstall... and start over

Alright, will do.

FYI: even with the logs that were coming in elastic-agent-20231201-3.ndjson that is the default filebeat ("Non-zero metrics in the last 30s" messages) with no custom field names, when I copied those logs into my myapp.log it still didn't parse correctly. That is the the entire log was in the message field

But I'll reinstall and start over.

OK Need to slow down a bit...

Get everything aligned....

The docs have some issues, sorry about that...

Of course because the ingest pipeline is not getting executed because the template is not getting applied which defines the pipelei etc...etc...etc...etc.. because the dateset is wrong it is all related...

Let me work though this I will get back... it is all close just an issue or 2

Please Verify you are using standalone or are you do you have a Fleet Server?

If so there is easier ways to do this... Standalone is fine I just want to know what you have

1 Like

Alright,

Thanks a lot for the help Stephen!

I'm certain I pressed 'n' during the setup when it ask for something 'fleet', so Standalone.

I followed this in the Stream any log file guide:

Right but you are NOT doing Fleet Managed... (i.e. you did not install a Fleet Server)
Looks like that is correct .. no fleet server .. ok no problem give me 20 mins...

This should not be this hard... sorry... I will be doing on Linux but should translate

1 Like

No I did not install any Fleet Server

1 Like