Incomplete data in ES


(Lo Zio) #1

Using filebeat 6.1.1 to gather apache logs.
It all started when I tried to change the index name for filebeat, and this seems to be a damn complicated thing despite seemingly an easy task in my mind. BTW I eventually reached the point where the ES index was created with the name I wanted, but when I started filebeat and data went into the index, each record only contained basic data (@timestamp, _id and a couple other fields) but nothing about my logs. Commenting out the lines to change index name and setup.template.* reverts to the standard index name, filebeat-6.1.1, and the data is complete so the input pipeline is fine
I thought it was some problem with the template, but found no documentation to explain how to work with it, just "set it up" type docs. A very basic howto would be very handy, I'll write one once I'll be fine with the setup.
So I switched to using logstash since I had to modify the logs anyhow. I setup my listener and the output was set to both ES with an index name of my choice, and the ruby colored output.
Starting filebeat I can see the console output containing all my fields coming from filebeat, but the data in ES only has the same basic fields as in my first tentative.
After two days of googling I suppose it is not something obvious as I thought. Any suggestion on:

  • where to find a comprehensive docs that says how to change the index name and setup the templates accordingly, something finer than "you have to setup templates".
  • why my data that seems ok in the console output does not go into ES in a complete way with all the fields?

Thanks


(Adrian Serrano) #2

This seems like a problem with loading the index pattern.

Changing the index name and pattern is as easy as:

    setup.template:
      name: "my-index-%{[beat.version]}-%{+yyyy.MM.dd}"
      pattern "my-index-*"

    output.elasticsearch:
      index: "my-index-%{[beat.version]}-%{+yyyy.MM.dd}"

And the index is installed automatically by filebeat. Can you share your configuration and a debug log (run with -d '*') to see if the index is installing correctly?


(Lo Zio) #3

This is the currently running config

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/apache2/*.log
  exclude_files: ['.gz$','access']
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
  reload.period: 10s
setup.template.settings:
  index.number_of_shards: 1
fields:
  env: DEV
setup.kibana:
  host: "one-syslog:5601"
output.elasticsearch:
  hosts: ["one-syslog:9200"]

I get record like this in my ES:


This is fine.

If I add the following lines, that are the ones like you pointed output

setup.template.name: "apache-error-dev-%{+yyyy.MM.dd}"
setup.template.pattern: "apache-error-*"

output.elasticsearch:
hosts: ["one-syslog:9200"]
index: "apache-error-dev-%{+yyyy.MM.dd}"

The index is created, I get something like (clipped)

but then data in ES is incomplete:
image

If I pass the whole thing through logastash, I see the full fields in the ruby console output, like the first image above, but ES contains only clipped record, as in the last image.

In the debug log I have several of these

2017-12-27T08:04:33+01:00 DBG  [publish] Publish event: {
  "@timestamp": "2017-12-27T07:04:33.218Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.1.1"
  },
  "beat": {
    "name": "dev-20170509-172-31-34-138",
    "hostname": "dev-20170509-172-31-34-138",
    "version": "6.1.1"
  },
  "source": "/var/log/apache2/sXXX.log",
  "offset": 85061,
  "message": "[Wed Dec 27 08:04:20.673257 2017] [wsgi:error] [pid 30779:tid 140234579130112] [remote 127.0.0.1:60574] user_id is 'XXX'",
  "fields": {
    "env": "DEV"
  },
  "prospector": {
    "type": "log"
  }
}

I see the mapping being created (clipped here):

2017-12-27T08:04:34+01:00 DBG [elasticsearch] PUT http://one-syslog:9200/_templ
ate/apache-error-dev-2017.12.27 map[mappings:{"default":{"_meta":{"version":"6.1.1"},"date_detection":false,"dynamic_templates":[{"fields":{"mapping":{"type":"keyword"},"match_mapping_type":"string","path_match":"fields."}},{"docker.container.labels":{"mapping":{"type":"keyword"},"match_mapping_type":"string","path_match":"docker.container.labels."}},{"strings_as_keyword":{"mapping":{"ignore_above":1024,"type":"keyword"},"match_mapping_type":"string"}}],"properties":{"@timestamp":{"type":"date"},"apache2":{"properties":{"access":{"properties":{"agent":{"norms":false,"type":"text"},"body_sent":{"properties":{"bytes":{"type":"long"}}},"geoip":{"properties":{"city_name":{"ignore_above":1024,"type":"keyword"}
...
2017-12-27T08:04:34+01:00 INFO Elasticsearch template with name 'apache-error-dev-2017.12.27' loaded

Thanks


(Lo Zio) #4

No one with an idea?


(Steffen Siering) #5

Filebeat modules are a mix of filebeat configuration and Elasticsearch Ingest Node pipelines.

The plain log message is contained in the message field in your debug output. You see any information missing here?

Can you also check Elasticsearch logs for grok failures?


(Lo Zio) #6

This is EXACTLY the problem: there is NO message field at all. Look at the screenshot above: fields start with @timestamp and end with _type. No other fields present.
Leaving the default index name, changing no other settings, all the fields appear (first screenshot).


(Steffen Siering) #7

Slow logs are multiline events. The module configures multiline here. Can you share some sample logs (of events with missing contents + events before/after), so we can have a loock for mismatches between the multiline filter configuration and you logs? Please redact sensitive data from logs.


(Lo Zio) #8

Here is one line from the log:

[Fri Jan 05 09:39:20.985349 2018] [wsgi:error] [pid 8406:tid 139678587021056] [remote 172.31.43.158:18753] connecting log xyz to listening address '/dev/log' with level 10

It is for sure only one line having a \n at the and. It is from apache 2.4, nothing strange.
I doubt it is a problem related to single/multiline stuff since if I just rename the destination index (that is, I do not specify nothing and use defaults) the very same lines are sent with no problems.

As you can see from the first bitmaps, with index name set to default (filebeat-xyz) the record in ES contains all the data.
Renaming the index I get the second screenshot, that is an empty record.


(Lo Zio) #9

I just tried to upload an exceprt from logfile but I can't since only images are supported, uploaded here https://pastebin.com/aFqT5xeN

You can easily get a sample as I just did:
tail /var/log/apache2/error.log


(Steffen Siering) #10

I just noticed you are using a prospector and a module for apache logs:

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/apache2/*.log
  exclude_files: ['.gz$','access']

Both do collect /var/log/apache2/error.log. This is not safe (as both prospectors can overwrite the file it's read state in the registry). When using the module, make sure the prospector can not read any files being processed by the module.

The parsing is configured to be done in Elasticsearch at ingest time. Sending the event to logstash, gets you the original unparsed event.

Apache2 error log is indeed not multiline. The ingest node pipeline configuration is available here: https://github.com/elastic/beats/blob/master/filebeat/module/apache2/error/ingest/pipeline.json

The pipeline removes the message field right after grok. If all parsing goes well, you will have the fields apache2.error.timestamp, apache2.error.level, ... apache2.error.message only. If you encountered a parsing error, your event will have the field error.message being set.

The index name should have no effect on the pipeline.

Do you get the incomplete message when sending to a different index only?


(Lo Zio) #11

Starting from the end: yes. As I wrote if I just remove the lines that change the index name, I get my data in ES with all the parsed fields ok. Just changing the name of the index makes the ES records only have the reduced field-set as in the picture. So NO OTHER CHANGES OTHER THAN THE INDEX NAME make the indexed data miss all the fields.
I understand the grok part, I have dozens of installations of ELK stacks and this is the only one where I use filebeat with a modified index name and have no message nor parsed fields (and no _groparse error fields).
About using both module and prospector, I just used what came in the default config and howtos and now I understand thre's something wrong with it but nevertheless without changing the index name it works correctly. I will test again removing the modules, as now I changed the default apache error log format to my custom one so I need to pass through Logstash for parsing/filtering.
To be honest I found a much easier and lighter way to get apache error logs: just send them out using syslog, and then parse with logstash removing filebeat from the pipeline and from each production server.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.