...its apparent that fields names with leading underscores and/or containing dots are a bad thing. I have observed that logstash's will generate a mapping against fields with leading underscores.
Are there plans to make logstash's mapping logic more aware of the elasticsearch schema? And, in the meantime, how can I handle unstructured log data coming in that is quite likely to occasionally break both of the above rules?
And finally, are there any other restrictions I should be mindful of with field naming etc?
I think I must have been misunderstanding something then. I thought that logstash was creating new mappings in elasticsearch for new data streams. It must be elasticsearch doing that itself then...
The de_dot plugin only deals with dots. If fields come in that clash with meta-field names that can cause all sorts of problems too as we saw when we received log output that contained an '_uid' field of type string...
I think I must have been misunderstanding something then. I thought that logstash was creating new mappings in elasticsearch for new data streams. It must be elasticsearch doing that itself then...
Yes, ES chooses how to map fields on its own. However, Logstash by default does provide an index template for logstash-* indexes with rules for the mapping that ES should apply, so it's not completely black and white. You can of course modify the index template so it fits your data.
We don't know exactly how many log sources we have (a lot) or their format (mostly bespoke). So sending data to std out is likely to lead to data overload aside from the hit on throughput.
You only need to look at one log file (or better yet, one log event) using file input. And you don't have to run this on your production logstash node if performance is a concern to you (I don't imagine running one event through would cause that significant of an impact).
p.s. If you don't know the log format...what are you planning on sending to ES? Just the raw message field? May we see your current logstash config. maybe it will make more sense.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.