Drawbacks of mapping all types as string?

Hi!

I'm using an ELK (2.x) stack to make logs from our web application searchable. This usually works pretty well, but I'm logging parameters to requests, and sometimes the dynamic mapping gets confused when the type of a parameter changes, sometimes for reasonable reasons (sometimes it's not particularly reasonable, but that's a separate issue...). I add exceptions to my dynamic mapping sometimes when I know that something is supposed to be a string, for example, and sometimes I do some mutation with grok to change parameter names if the type is just different. But sometimes this feels like playing whack-a-mole as more and more new code paths with new and different parameter types cause new issues.

My question: what's the drawback of just mapping all fields as strings? For instance, what do I gain by having number fields indexed as number types? Does it use more/less space?

Also, in particular, will this work for object-valued parameters, or nested arrays of objects?

I guess the two options I see for mapping everything as a string are

  1. Disable automatic type creation a la https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dynamic-mapping.html#_disabling_automatic_type_creation
  2. Define a dynamic mapping for all field names to string, something like Force all the JSON values as String type - Error - Merging dynamic updates triggered a conflict

As I understand it, if I do the latter I could still add exceptions for specific whitelisted fields if I know their type in advance? But would that be useful?

Overall, my use case is mainly just to be able to search through logs for events and find specific incidents, filtering by things like pages or users affected. Most of the time the integer valued fields are things like database IDs.

Thanks for the help!

Oh, one related issue to this: I sometimes have object values with dynamic property names. if I do keep dynamic mapping enabled, is there a way to accept any property names? Is that bad for performance?

If you think about response times for example then 1000ms will be faster then 200ms which is obviously no true.

So basically if you need to compare, sort, aggregate on such fields, use numbers. If it's just about search then String is fine IMO. Will probably take more space than numbers though.

Ah okay, that's about what I figured. I think for the most part I'm fine with just searching, I just want my records to get stored at all, which right now they sometimes aren't if the dynamic mapping has given a field a type that's not inclusive enough.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.