Converting logstash to filebeat module strangeness

I've created a pipeline in elasticsearch to process some jetty output:

PUT _ingest/pipeline/filebeat-6.7.2-jetty-log-pipeline
{
"description" : "Ingest pipeline for jetty stderror",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["(?m)%{CATALINA_DATESTAMP:jetty_timestamp} %{DATA:java_class} %{DATA:java_method}\n%{LOGLEVEL:log_level}: %{JAVALOGMESSAGE:log_message}"],
"ignore_missing": false
}
},
{
"remove":{
"field": "message"
}
},
{
"date": {
"field": "jetty_timestamp",
"target_field": "@timestamp",
"formats": ["MMM dd, yyyy HH:mm:ss a", "EE MMM dd HH:mm:ss z yyyy" ]
}
},
{
"remove": {
"field" : "jetty_timestamp"
}
}
],
"on_failure" : [{
"set" : {
"field" : "error.message",
"value" : "{{ _ingest.on_failure_message }}"
}
}]
}

Here's a capture from tcpdump of filebeat transmitting it to ES:

{"index":{"_index":"filebeat-6.7.2-2019.06.12","_type":"doc","pipeline":"filebeat-6.7.2-jetty-log-pipeline"}}
{"@timestamp":"2019-06-12T00:56:48.979Z","fileset":{"module":"app","name":"stderroutlog"},"event":{"dataset":"jetty.log"},"beat":{"name":"server-web","hostname":"server-web","version":"6.7.2"},"source":"/var/log/jetty/2019_06_08.stderrout.log","offset":2546,"message":"Jun 08, 2019 12:19:46 AM org.springframework.orm.jpa.AbstractEntityManagerFactoryBean destroy\nINFO: Closing JPA EntityManagerFactory for persistence unit 'defaultPU'","prospector":{"type":"log"},"log":{"file":{"path":"/var/log/jetty/2019_06_08.stderrout.log"},"flags":["multiline"]},"input":{"type":"log"},"host":{"name":"server-web","os":{"name":"Ubuntu","codename":"bionic","platform":"ubuntu","version":"18.04.2 LTS (Bionic Beaver)","family":"debian"},"id":"xxxxxxxxxxxxxxx","containerized":false,"architecture":"x86_64"},"meta":{"cloud":{"provider":"ec2","machine_type":"r4.large","region":"us-east-1","availability_zone":"us-east-1a","instance_id":"i-xxxxxxxxxxxx"}}}

Here's a test of it working with simulate:

POST _ingest/pipeline/filebeat-6.7.2-jetty-log-pipeline/_simulate
{
"docs": [
{
"_source": {
"message": """
Jun 08, 2019 12:19:46 AM org.springframework.orm.jpa.AbstractEntityManagerFactoryBean destroy
INFO: Closing JPA EntityManagerFactory for persistence unit 'defaultPU'
"""
}
}
]
}

{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"@timestamp" : "2019-06-08T12:19:46.000Z",
"java_method" : "destroy",
"log_level" : "INFO",
"log_message" : "Closing JPA EntityManagerFactory for persistence unit 'defaultPU'",
"java_class" : "org.springframework.orm.jpa.AbstractEntityManagerFactoryBean"
},
"_ingest" : {
"timestamp" : "2019-06-12T01:39:33.216Z"
}
}
}
]
}

However, nothing shows up when I attempt to search for it with Kibana. The ES logs don't show anything out of the ordinary, and error.message:* returns zero results so I think that means it ingested correctly. My time range is set to "last 24 hours." Have I missed something obvious?

This is running 6.7.2 of ES and Filebeat.

I’m tempted to say yes. :speak_no_evil:

Jun 08, 2019 12:19:46 AM

This is what your stuffing into @timestamp, and this older than 24h?
Riddle me that!?

Look closer:

"@timestamp":"2019-06-12T00:56:48.979Z",

I also changed the range to "this week" and "this month" with no difference in results.

Assuming the @timestamp I'm seeing captured by tcpdump isn't the end-result timestamp that ES ends up with (pipeline overrides it), is there a default expiration of entries?

Better still, is there a log setting that will expose what ES is doing with these?

hehe I did look closer your pipeline IS overwriting the @timestamp in ES before indexing, thats what you wrote the pipeline for, I alluded to that in my answer.

So your intention is to overwrite the @timestamp, with, in this case a timestamp from June 08. And then you stuff the event into a daily index named for June 12...

Now we don’t at this point how you configured the index pattern in Kibana and this is key here, an index pattern has a specific field it considers to be the timestamp of events, is it @timestamp in your case? What’s the other setting of the index pattern in Kibana...?

How I debug those things in plain sight:
You ship such events into an index where there is nothing else, a test index for your debugging session. Then you configure the index pattern for that index in kibana to have NO timestamp field... which means “my events have no timestamp”.
when you use kibana discover tab on such an index pattern to view events, there is no time picker, and the entire timestamp/picker is out of your way so you can see all the events in the index and actually look at them to understand how they ended up.

It’s possible that kibana, for a time based index, will not show you an event which is basically in the wrong index according to its timestamp... I’m tempted to say that should work anyway, but maybe it depends on how you’ve configured the index pattern. I’ve never put events for June 08 into the June 12 daily index, so I have never needed to debug this further.

for debug logs, I can’t answer, I route my pipeline failures into an index made for ingest failures and i look at the failure over there.

Maybe if ES is debug mode, it prints logs for successful pipelines? Never looked into that.

Sorry -- I need to look closer, not you. Hand slapped.

My limited understanding is that, using default settings, index names don't have any special meaning. Just because I create daily indexes, shouldn't automatically mean they will only accept matching timestamps. As long as I'm using Kibana's Discovery tab to search the filebeat-* indexes, and the time range includes what @timestamp is set to, I should find it.

I'm using the template that comes with filebeat 6.7.2. It has not be altered. My understanding here is that fields not explicitly mapped are covered by the fields.* catchall.

This is where things get cloudy for me -- according to tcpdump, the index (and pipeline) is being chosen for me:
{"index":{"_index":"filebeat-6.7.2-2019.06.12","_type":"doc","pipeline":"filebeat-6.7.2-jetty-log-pipeline"}}

I'll try setting these in the log.yml file and see if they stick. I want to say I tried setting the pipeline before but tcpdump was showing that this setting wasn't being transmitted, but I'll revisit this.

Okay, I've updated the pipeline so that any breakage should go into a new index:

PUT _ingest/pipeline/filebeat-6.7.2-jetty-log-pipeline
{
  "description" : "Ingest pipeline for jetty stderror",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{CATALINA_DATESTAMP:jetty_timestamp} %{DATA:java_class} %{DATA:java_method}\n%{LOGLEVEL:log_level}: %{JAVALOGMESSAGE:log_message}"],
        "ignore_missing": false
      }
    },
    {
      "remove":{
        "field": "message"
      }
    },
    {
      "date": {
        "field": "jetty_timestamp",
        "target_field": "@timestamp",
        "formats": ["MMM dd, yyyy HH:mm:ss a", "EE MMM dd HH:mm:ss z yyyy" ]
      }
    },
    {
      "remove": {
        "field" : "jetty_timestamp"
      }
    }
    ],
      "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  },
  {
    "set": {
        "field" : "_index",
        "value" : "failed-{{ _index }}"
      }
  }]
}

Here's a new tcpdump capture -- the date of the item occurred today, so there can't be any timestamp shenanigans:

{"index":{"_index":"filebeat-6.7.2-2019.06.12","_type":"doc","pipeline":"filebeat-6.7.2-jetty-log-pipeline"}}
{"@timestamp":"2019-06-12T18:38:21.062Z","offset":9761,"fileset":{"name":"stderroutlog","module":"app"},"event":{"dataset":"jetty.log"},"input":{"type":"log"},"beat":{"version":"6.7.2","name":"server-web","hostname":"server-web"},"host":{"architecture":"x86_64","os":{"family":"debian","name":"Ubuntu","codename":"bionic","platform":"ubuntu","version":"18.04.2 LTS (Bionic Beaver)"},"id":"xxxxxxxx","containerized":false,"name":"server-web"},"message":"Jun 12, 2019 6:38:20 PM org.springframework.orm.jpa.AbstractEntityManagerFactoryBean destroy\nINFO: Closing JPA EntityManagerFactory for persistence unit 'defaultPU'","source":"/var/log/jetty/2019_06_12.stderrout.log","prospector":{"type":"log"},"meta":{"cloud":{"machine_type":"r4.large","region":"us-east-1","availability_zone":"us-east-1a","provider":"ec2","instance_id":"xxxxxxxxx"}},"log":{"file":{"path":"/var/log/jetty/2019_06_12.stderrout.log"},"flags":["multiline"]}}

In the Discover tab, with the filebeat-* index selected and time range set to "This Month", I searched for "EntityManagerFactory" and got back no results.

Under Kibana -> Management -> Elasticsearch -> Index Management, there are no indexes that start with 'failed'. Monitoring shows everything is "green."

It bothers me that ES's default setting is to silently drop. At least with logstash I'd get something in the logs when there were problems consuming the entries.

As I thought - it was something simple (albeit not quite obvious). Everything was working fine. The search I was attempting assumed that the field was of type "text" -- instead it defaulted to "keyword."

I assume this is why searching for:

  • EntityManagerFactory
  • *EntityManagerFactory*

failed, but

log_message: *EntityManagerFactory*

worked?

Where should I look to see why ES determined that log_message should be of type keyword?