I'm wondering what the limitations are for field names. I've seen some crazy things in our dynamic mappings, so I bet Lucene and ES are super flexible.
For example, can you use a leading underscore?
I notice that ES metadata fields have the convention of starting with an underscore. We'll try to be careful not to collide with those, of course.
(We're also using ECS, so I'll avoid existing fields/sets or base fields and will follow the guidelines.)
I'm not sure if there are any or what the limitations for field names are, but there are some things that I would avoid.
For example I would avoid having spaces, square brackets and curly brackets in the field names.
I would also avoid using dots in the field name as this can lead to confusion if the dot is part of the field name or if it indicates that it as nested object.
Leading underscores are no issue, I use them on target fields for logstash parsing filters like the json, kv and csv filters.
In this case when I parse a json for example, it will expand its fields nested under a field starting with an underscore, like _json, and I map this field as flattened.
For example I would avoid having spaces, square brackets and curly brackets in the field names.
Good point.
I would also avoid using dots in the field name as this can lead to confusion if the dot is part of the field name or if it indicates that it as nested object.
Good point.
Leading underscores are no issue, I use them on target fields for logstash parsing filters like the json, kv and csv filters.
Good info, thanks for the input.
So maybe other than avoiding the existing metadata field names
_index
_id
_source
_size
_doc_count
_field_names
_ignored
_routing
_meta
_tier
it might be good to avoid terms that might become future metadata fields. I probably shouldn't create fields called _schema or _shard or _node.
And if I don't want to put in the effort to think carefully about whether a term might become a future metadata field, I could probably default to using all-caps names as another way of helping avoid collision. _LOGS, for example.
We probably also want to avoid Logstash-internal fields or Elastic Agent fields like @metadata.
It feels like the @ — at sign — was intended by folks at different points to carve out spaces for reserved use (@timestamp, @metadata), so probably best to avoid that as a start character.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.