Templates/mappings defined by logstash

mostolog · November 2, 2016, 4:06pm

Hi

I'm running several logstash instances, each one parsing logs from one application.
On each instance, I grok to get all fields I need, before indexing them on elasticsearch.

As each logstash is isolated from others, I was wondering if there's a feature to define the elasticsearch template/mapping each application should use WITHIN LOGSTASH, instead of creating it manually.

eg: setting number/date format, and analyzed from LS

Update:
Just realized there's something that could do the trick: Elasticsearch output plugin | Logstash Reference [8.11] | Elastic
Unfortunately, this doesn't seem to work:

template => "%{[@metadata][template]}"

output {
    elasticsearch {
      # This setting must be a path
      # File does not exist or cannot be opened %{[@metadata][template]}
      template => "%{[@metadata][template]}"
      ...
    }
  }

github.com/logstash-plugins/logstash-output-elasticsearch

metadata substitution does not work for variables that expect file paths

opened 12:36AM - 20 Jan 16 UTC

closed 05:46PM - 16 May 18 UTC

ppf2

enhancement docs

Take the following as an example (reproducible on LS 2.1.1): ``` filter { ruby …{ init => "@indexer_host = 'test'" code => " @truststore = '/Users/Test/Downloads/' + @indexer_host + '.jks' event['@metadata']['truststore'] = @truststore " } } output { elasticsearch{ truststore => "%{[@metadata][truststore]}" # does not work #template => "%{[@metadata][truststore]}" # another field that expects file system path also doesn't work # index => "%{[@metadata][truststore]}" # this works } } ``` The index field works because the ES output fails with the following message (expected): ``` 00, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [/Users/Test/Downloads/test.jks], must not contain the following characters [\\, /, *, ?, \", <, >, |, , ,]", "index"=>"/Users/Test/Downloads/test.jks"}}}, :level=>:warn} ``` But if you try to substitute into a parameter that expects a file system path (eg. truststore, template, etc..), then it throws an error at config validation time: ``` Invalid setting for elasticsearch output plugin: output { elasticsearch { # This setting must be a path # File does not exist or cannot be opened %{[@metadata][truststore]} template => "%{[@metadata][truststore]}" ... } } {:level=>:error} ```

Hope I explained myself properly.
Thanks in advance.

theuntergeek · November 2, 2016, 6:05pm

Unfortunately, this is not possible. The Elasticsearch output plugin connects, and uploads the template at initialization time, before any events are processed. How would there be a template in the metadata to send if there are no events with any metadata at that time?

Aside from this, if we try to ad to Logstash the ability to add custom mappings on the fly, it would constantly be having to keep in sync with Elasticsearch, and reload/re-push mapping changes. This would be a huge performance bottleneck.

The best you can do is use the mutate filter (and/or grok) to cast values as integer and float in Logstash, so you at least have numeric types matched. This is suboptimal, as Elasticsearch will err on the side of caution, and assign the largest primitive data type it can, e.g. long and double. But Elasticsearch uses all of the java primitive types, so you can use byte, short, int, etc. We've even added half-float to Elasticsearch, to save for reduced precision floating point numbers.

mostolog · November 3, 2016, 8:16am

Unfortunately, this is not possible.

Neither it will set not_analized and so.

Considering this scenario:
input -> redis -> logstash_per_application -> redis -> logstash_indexer -> elasticsearch

I was trying to delegate into logstash the management(creation, update...) of each application template. BTW: each stage is done within a docker container.

Is there any way to import app-template.json into elastic from the logstash stage?
Is manually creating the template the only way to do it? (remeber the not_analized feature)
Does anyone have a magic idea to solve this?
Will I get something nice for Christmas?

Does it exists something like --on_launch_run_this_command to "run a pipeline just once" that can work to setup templates?

Thanks, regards, and have a nice day

theuntergeek · November 3, 2016, 3:08pm

The default Logstash templates—2.x or 5.x—set all string/text fields to be both analyzed and not_analyzed (see the .raw and .keyword multi-field configurations in those templates).

Yes. Use template => app-template.json in the output block.

Yes. You should know enough about your data to be able to create this. If not, as already mentioned, the included Logstash template does a multi-field approach, adding a .raw (2.x) or .keyword (5.x) version of each string/text field you send. Dynamic templating does help with unknowns.

mostolog · November 3, 2016, 3:55pm

Hi

I have already read about .keyword fields, but I don't think I understood how to use them properly. Anyway, if you don't mind, I'll deal with that later.

I'm looking for a way each logstash instance being able to import his own template.

logstash-app1 imports template for index-app1 into elasticsearch, while logstash-app2 imports index-app2 template.

Remember:

One logstash per application
Chained architecture: input -> redis -> logstash_per_application -> redis -> logstash_indexer -> elasticsearch"

Using template => "%{[@metadata][template]}" would do the trick if it wasn't a initialization task.
Also, it's important to have in mind this import task should only be run once.

After having an eye on ruby plugin, I just found two ugly alternatives which could act as workarounds:
logstash-app will have 2 files:

pipe.conf: input-filters-output (current one)
template.conf: filter-special_output_if_first_event

Using sleep filter to add a field, and conditionals to index event using a template. template.conf may look like:

filter {
    sleep {
        add_field => { "first_event" => "yes" }
        time => "1"
        every => 1000000
    }
}
output {
    if [first_event] {
        elasticsearch {
            # Although template will be loaded at initialization time, 
            # I can hardcode the path here, because it will only be _imported_ if the field is present
            # (1 of each 1000000 events)
            template => "my-template-path"
        }
    } else {
        # event must be indexed normally.
    }
}

Another approach could be using throttle:

filter {
    throttle {
        before_count => 1
        after_count => -1
        period => 86400
        max_age => 86400
        key => "%{message}"
        add_field => { "first_event" => "yes" }
    }
}

Have a thousand cats died while you were reading this?

Is there such a thing as a run-once pipeline/thread?

theuntergeek · November 4, 2016, 4:37pm

This seems like a lot of effort.

Do you really not know anything about the data that is coming in? If you're defining one Logstash instance per app, do you really not know what the log entries for the app will look like? You can't create a dedicated template file in advance for it?

One of the tricks we've used in the past is to index a single document, or even a few dozen into Elasticsearch with nothing but the default, out of the box template. Once indexed, read the mapping via the API. You should be able to easily tweak the resultant mapping for fields to be .keyword or .raw as needed. Save that as your template, and you're done.

mostolog · November 4, 2016, 8:45pm

It is not just a question of knowing each index fields, but each application/logstash instance being self-contained.

I guess a startup script for each container could PUT/POST templates before running logstash, but I'll have to deal with running as root (which, IIRC, was an issue in the past)

Actually, I also start to think that approach is too much effort/has drawbacks.

theuntergeek · November 4, 2016, 10:13pm

Trying to make templates deterministic is not ideal. Elasticsearch officially advises explicit mapping for performance and storage reasons. If you must do this, make the template before building your container, or make it accessible via a URL, so it can be pulled, and pushed via curl or something. Logstash will not likely ever support a way of doing deterministic template pre-mapping in a single pipeline.

mostolog · November 7, 2016, 9:21am

Hi

I think that's the final approach I'll take:

Docker container entrypoint+cmd=

curl -XPUT template
logstash

Thanks a lot. I'll let you know if I find any issue.

Topic		Replies	Views
Help with Elasticsearch template definition Elasticsearch	4	772	December 12, 2016
Specifying with Elasticsearch template and file to use Logstash	3	8148	July 6, 2017
About index template upon output from logstash to elasticsearch Logstash	4	477	July 6, 2017
Logstash/Elasticseach and template issue Elasticsearch	5	604	January 20, 2019
Set a template_name in Logstash output filter to send data elasticsearch Logstash	3	2586	December 31, 2019

Templates/mappings defined by logstash

Related topics