Dynamic data type for one field

Hi, I don't know, if this topic has been already discussed, but I didn't find an appropriate one - hence, I'm creating a new one.
I've got an nginx log whose partial representation is as follows. First, the record stating that the request finished successfully:

... 1 1.615 0.0 0.003 - - - -

And the next one that finished with an error:

... 2 - - 0.003 10 Too many requests

The fields' names are as follows (from left to right):

revision of running code
backend duration
post-backend duration
pre-backend duration
error code
error description

The field separator is the tab char.

When the request finishes successfully, my current nginx config writes the "-" char, indicating that there were no errors with the request (at least in the provided partial log). Otherwise, it writes the appropriate description in the field.

The problem I have is with the data types for the duration of request processing. I would like logstash to cast the number into the float data type, and the "-" char - to string data type. The float data type would be used to perform calculations of mean times and so on.

So far, I was able to develop such a pattern for the durations in the above-provided log (example for backend duration):

\t(%{NUMBER:backend_duration:float}|-)\t

However, the problem is, when I have "-" in the given field, logstash will simply omit this field and won't create it in the elastiscsearch database. On the other hand, the "-" char is hardly a number :wink:

Is there a possibility to perform this kind of "dynamic data typing" in logstash? For example, if there's a number data type, cast it into float; if there's "-", leave it as string - it would be done conditionally, based on what kind of data logstash finds in the given field. I read the docs, but I didn't the appropriate solution to my problem.

If I didn't provide enough information, please tell me so - I will try to provide as much info as I can, if someone can help me deal with this issue :slight_smile:

In logstash, the type of a field can be different on every event. However, in elasticsearch, once the type of a field is set for an index then it cannot change. If you index a document that has backend_duration equal to "-" then if you afterwards index a document where it is 0.1 that will get changed to the string "0.1". If you first index a document where it is 0.1, then if you afterwards try to index a document where it is "-" you will get a mapping exception and the document will not be indexed.

If I understand correctly, elasticsearch will index a given data type for a field and stick to it consequently? Is that correct?

In that case, data type in logstash is irrelevant, since elasticsearch will fix the given type to the field - once it is indexed as string in the elasticsearch database, it doesn't matter what data type logstash assigned to it based on the grok pattern (either string or float)? Is that also correct?

Yes, both of those are correct.

Sorry for my non-responsiveness. I have analyzed everything and, unfortunately, I will have to change my flows for my current requirements.

Thanks for helping me out with verifying that. I am accepting your first answer as the solution :slightly_smiling_face:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.