Logstash 2.x : Dynamic Mapping

dawiro · December 4, 2015, 8:07am

Hi,
Based on the breaking changes in elasticsearch 2,0 and other information:

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_20_mapping_changes.html

...its apparent that fields names with leading underscores and/or containing dots are a bad thing. I have observed that logstash's will generate a mapping against fields with leading underscores.

Are there plans to make logstash's mapping logic more aware of the elasticsearch schema? And, in the meantime, how can I handle unstructured log data coming in that is quite likely to occasionally break both of the above rules?

And finally, are there any other restrictions I should be mindful of with field naming etc?

Regards,
David

warkolm · December 4, 2015, 8:46am

You can use the de-dot plugin to help here.

magnusbaeck · December 4, 2015, 12:32pm

What do you mean by "logstash's mapping logic"? Logstash just emits JSON documents according to the rules that you set up.

dawiro · December 4, 2015, 12:36pm

I think I must have been misunderstanding something then. I thought that logstash was creating new mappings in elasticsearch for new data streams. It must be elasticsearch doing that itself then...

dawiro · December 4, 2015, 12:44pm

The de_dot plugin only deals with dots. If fields come in that clash with meta-field names that can cause all sorts of problems too as we saw when we received log output that contained an '_uid' field of type string...

magnusbaeck · December 4, 2015, 1:00pm

I think I must have been misunderstanding something then. I thought that logstash was creating new mappings in elasticsearch for new data streams. It must be elasticsearch doing that itself then...

Yes, ES chooses how to map fields on its own. However, Logstash by default does provide an index template for logstash-* indexes with rules for the mapping that ES should apply, so it's not completely black and white. You can of course modify the index template so it fits your data.

dawiro · December 4, 2015, 2:34pm

The problem we have is that we don't know up front what the format of the inbound data is...

vtst2412 · December 4, 2015, 7:28pm

This should help with that

output{
	stdout{ codec => rubydebug }
}

dawiro · December 5, 2015, 9:27am

We don't know exactly how many log sources we have (a lot) or their format (mostly bespoke). So sending data to std out is likely to lead to data overload aside from the hit on throughput.

vtst2412 · December 5, 2015, 6:31pm

You only need to look at one log file (or better yet, one log event) using file input. And you don't have to run this on your production logstash node if performance is a concern to you (I don't imagine running one event through would cause that significant of an impact).

p.s. If you don't know the log format...what are you planning on sending to ES? Just the raw message field? May we see your current logstash config. maybe it will make more sense.

Topic		Replies	Views
Elasticsearch 2.1: Regarding Meta-Fields and Dynamic Mapping Elasticsearch	2	522	July 5, 2017
Logstash 2.1: Dynamically Altering Field Names Logstash	6	1583	July 6, 2017
ES 2.1.2: Handling bad field names Elasticsearch	3	789	July 5, 2017
Dynamically adding fields with ES field mapping template enabled Logstash	3	1249	July 17, 2017
Removing field from mapping Elasticsearch	4	10941	July 5, 2017

Logstash 2.x : Dynamic Mapping

Related topics