User_agent.version gets mapped as date

Hi,

I keep running into this error when I try to index documents and use user_agent to parse a field that includes user agent information:

{
  "type": "mapper_parsing_exception",
  "reason": "failed to parse field [connections_user_agent.version] of type [date] in document with id '2'. Preview of field's value: '119.0.0.0'",
  "caused_by": {
    "type": "illegal_argument_exception",
    "reason": "failed to parse date field [119.0.0.0] with format [strict_date_optional_time||epoch_millis]",
    "caused_by": {
      "type": "date_time_parse_exception",
      "reason": "Failed to parse with all enclosed parsers"
    }
  }
}

This is happening because if I fed user_agent with

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0

that "20100101" bit is parsed as date causing user_agent.version to be mapped as date.

So, when later I try when I feed user_agent with

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36

I get the error above because "119.0.0.0" is not a valid date.

I tried to solve this by explicitly mapping user_agent.version as text / keyword while the index is empty, but no success: ES seems to ignore the existing mapping and re-maps user_agent.version as date.

Suggestions?

Hello,

How are you indexing your data? What tools are you using? Also, how did you mapped it? Please share your index template.

user_agent.version needs to be mapped as a keyword , so you would need to configure a index template with this mapping and that index template should have an index pattern that would match your index name.

Hi, Leandro.

I am using the mapping API via cURL. I'll try to explain better using the console:

DELETE my_empty_index
PUT _ingest/pipeline/my_user_agent
{
  "description" : "Add user agent information",
  "processors" : [
    {
      "user_agent" : {
        "field" : "agent"
      }
    }
  ]
}
PUT my_empty_index/_doc/1?pipeline=my_user_agent
{
  "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0"
}
PUT my_empty_index/_doc/2?pipeline=my_user_agent
{
  "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}

doc 2 will throw the mapping error - if you check the mapping you'll confirm that user_agent.version has been mapped as date

Fun fact: If you reverse the document order (i.e. put doc 2 before doc 1) it will work just fine.

Question 1: shouldn't the processor provide proper mapping?

If I provide explicit mapping

PUT my_empty_index
{
  "mappings": {
    "properties": {
      "agent": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "user_agent": {
        "properties": {
          "device": {
            "properties": {
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "original": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "os": {
            "properties": {
              "full": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "version": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

....this problem will not arise on my localhost installation, but it insist on happening on the live server (not setup by me).

Question 2: has this something to do with the way ES is installed?

The client is still on ES 7.9 (and so is my localhost because of that) - so I don't think this could this a version issue...

Thanks for your help! :slight_smile:

[SOLVED] but...

So, the issue was caused by logic in my code:
on my client's server, there was a mapping template being applied to new indices, and because of that, my code was not updating the existing mapping with properties for the user_agent fields.

But Question 1: shouldn't the processor provide proper mapping? is still relevant.

Why are we forced to create explicit mapping for the user_agent fields?

According to User agent Fields | Elastic Common Schema (ECS) Reference [8.11] | Elastic, one would expect such mapping to be put in place by the processor... :thinking:

Thoughts?

It should not, the processors are used to parse or transform the data using an ingest pipeline, the indexing of the data happens only after the pipeline is finished, the mapping validation will happen during the indexing.

This is how elasticsearch works, this is a core concept, the mapping must be provided on the index creation by making a request with the proper mapping or using an index template, which is a better approach.

If you do not provide the mapping for some field then Elasticsearch will infer its type by its value, it may get it right, but it may get it wrong as well, and since mappings cannot be changed without creating a new index or reindexing, it is better to provide your mapping before hand.

ECS is a reference on how the fields should be mapped on the indices, it provides a common schema to normalize your data, but you need to create the mappings yourself.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.