I need to configure filebeat to write a particular field as a string, even when it's a number. I'm trying to use the "convert" processor but it doesn't seem to be doing the job. I am not 100% sure I've added the convert processor to the right locations in the configuration, however.
I have some log lines coming from applications I don't control. Two of the applications use the same field name for different types of values. Thankfully, one is using numbers and the other strings, so I don't have to worry about messy object conversion.
If the "number-emitting" service sends a log message first, the field in the filebeat mapping is permanently set to a long. If I'm lucky, the "string-emitting" service sends a log message first, and then there's no problem: subsequent longs are converted to strings (keywords).
I'm really, really hoping there's not some secret magic going on in ES that's converting the values silently, overriding the conversion in filebeat. If there is, I guess I'm kinda screwed as I don't think I can manage creating mappings by hand for all these logs with lots of various fields.
Here's (I believe) all of the relevant bits from filebeat.yml:
I've tried putting the convert processor under filebeat.autodiscover.processors but that made no difference.
(For what it's worth, that drop_fields processor does work, so I know some processors go here, and kubernetes metadata is appearing in the logs, so I know some processors go there).
So if the value comes in as a string (even if it the string is just a number) you should get a keyword mapping in Elasticsearch. You can test this out yourself:
PUT /mynewindex/_doc/1
{
"test": "54"
}
GET /mynewindex/_doc/1
GET /mynewindex
That being said, once the index is created and the first value has arrived, the mapping is created and all subsequent values for that field will be required to be able to match the initial mapping.
I think your configuration is just fine. If you're looking to see what type it is coming out of filebeat, i'd recommend using the output.console: pretty setting to print the logs to the console where you can verify that the type is as you expect before it lands in Elasticsearch.
The easiest way to ensure the field is always a keyword/text field is to produce a mapping for it. You do not have to map every field in your mapping, just the ones you want to have a specific type for. You can specify the mapping for just a couple of the fields while leaving the rest of the fields to be dynamically defined.
With dynamic mappings you can use regex to match field names or you can do things like force all fields that would be mapped as a long to be mapped as text. More info here: Dynamic templates | Elasticsearch Guide [8.17] | Elastic
Another option would be to place the logs into indices based on the source application or based on a tag or annotation on the k8s pod or the namespace or other attributes from the deployment. This would ensure that you don't have to worry about these conflicts going forward and won't have to worry about future conflicts between applications you don't control.
So if the value comes in as a string (even if it the string is just a number) you should get a keyword mapping in Elasticsearch. You can test this out yourself:
OK, cool, so we can confirm it's not ES itself doing the conversion.
i'd recommend using the output.console: pretty setting to print the logs to the console where you can verify that the type is as you expect before it lands in Elasticsearch.
Thanks for this tip, I've tried that now. I see that the numeric values are indeed not being converted to strings by filebeat. I've tried with the convert processor either under filebeat.processors or the top level processors objects. I also tried using {from: "id", type: "string", to: "some_other_key"} and I see that filebeat is not writing to the new key. I guess I'm configuring the convert processor wrong. Could the json settings under the kubernetes provider default_config be overriding the convert processor somehow?
(edit: I see that processors can also be put under inputs. I tried adding a processors array under the filebeat.autodiscover.providers objects and had the same result, no conversion and no new key.)
With dynamic mappings you can use regex to match field names or you can do things like force all fields that would be mapped as a long to be mapped as text.
This looks like a good direction to go. I'm comfortable enforcing a rule that any field that is "id" or ends in "id" be a string or number -- so far that seems to be the case, anyway.
Do you know how to implement dynamic templates for indexes created by filebeat? It's my understanding that filebeat starts out with a ~28k line index template, which includes some dynamic templates near the top, but there doesn't appear to be a way to add more. There looks to be a add_fields option under setup but I don't see how to add a dynamic template there.
Can you share the configuration you used (including the convert processor), a sample input message and what was printed with output.console: pretty? That way we can verify if the convert issue is a configuration issue or a bug we should file.
For changing the template you can run filebeat export template > filebeat-template.json and then modify the exported template.
You may be able to get filebeat to load the custom template for you by adding this to the config, though I have not tried it:
I can post the config and input message, but I'm not sure how to share all of the autodiscover-related metadata. Is there a way to simulate the entire environment so you're testing what I'm testing, down to the autodiscovery? I'm concerned that there's some interaction between the autodiscover kubernetes provider (and its json config) and processors.
Let's start with just the simple stuff and see if we can reproduce the same issue.
Ideally we find the simplest reproduction that has the same issue.
Autodiscover ultimately compiles to a normal Filebeat config when running and once the log message leaves the input it's basically just a json object once it hits the processor.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.