Pipeline cannot ingest lines with the word "value" in them

I am trying to create a pipeline for a gitlab log. I run a simulate on the message in Dev Tools and it works fine:-

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
         "gsub": {
           "field": "message",
           "pattern": "value",
           "replacement": "dummy_value"
         },
        "json": {
          "field": "message",
          "add_to_root": true
          
        },
      "remove": {
        "field": "message"
      }
      }

    ]
    
  },
  "docs": [
    {
      "_source": {
        "message": """{"time":"2020-04-01T01:36:54.435Z","severity":"INFO","duration":30.62,"db":15.87,"view":14.750000000000002,"status":204,"method":"POST","path":"/api/v4/jobs/request","params":[{"key":"info","value":{"name":"gitlab-runner","version":"12.8.0","revision":"1b659122","platform":"linux","architecture":"amd64","executor":"shell","shell":"bash","features":{"variables":"[FILTERED]","image":null,"services":null,"artifacts":null,"cache":null,"shared":null,"upload_multiple_artifacts":null,"upload_raw_artifacts":null,"session":null,"terminal":null,"refspecs":null,"masking":null,"proxy":null}}},{"key":"token","value":"[FILTERED]"},{"key":"last_update","value":"8b749dd0becc849c3ecc2b93bace9261"}],"host":"dtcrhgitd01.dtc.rccad.net","remote_ip":"10.236.0.73, 127.0.0.1","ua":"gitlab-runner 12.8.0 (12-8-stable; go1.13.7; linux/amd64)","route":"/api/:version/jobs/request","queue_duration":7.64,"correlation_id":"HRrJf5E3Pk9"}"""
      }
    }
  ]
}

It works well so I create the pipeline:-

PUT _ingest/pipeline/gitlab_api_json_pipe
{
  "processors": [
    {
         "gsub": {
           "field": "message",
           "pattern": "value",
           "replacement": "dummy_value"
         },
         
         "json": {
          "field": "message",
          "add_to_root": true
        }
      },
    {
      "remove": {
        "field": "message"
      }
    }
    
  ]
}

I then run the message against the pipeline:-

PUT stu_test/_doc/1P?pipeline=gitlab_api_json_pipe
{


         "message": """{"time":"2020-04-01T01:36:54.435Z","severity":"INFO","duration":30.62,"db":15.87,"view":14.750000000000002,"status":204,"method":"POST","path":"/api/v4/jobs/request","params":[{"key":"info","value":{"name":"gitlab-runner","version":"12.8.0","revision":"1b659122","platform":"linux","architecture":"amd64","executor":"shell","shell":"bash","features":{"variables":"[FILTERED]","image":null,"services":null,"artifacts":null,"cache":null,"shared":null,"upload_multiple_artifacts":null,"upload_raw_artifacts":null,"session":null,"terminal":null,"refspecs":null,"masking":null,"proxy":null}}},{"key":"token","value":"[FILTERED]"},{"key":"last_update","value":"8b749dd0becc849c3ecc2b93bace9261"}],"host":"dtcrhgitd01.dtc.rccad.net","remote_ip":"10.236.0.73, 127.0.0.1","ua":"gitlab-runner 12.8.0 (12-8-stable; go1.13.7; linux/amd64)","route":"/api/:version/jobs/request","queue_duration":7.64,"correlation_id":"HRrJf5E3Pk9"}"""

}

And I get the following error:-

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [params.dummy_value] of different type, current_type [text], merged_type [ObjectMapper]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [params.dummy_value] of different type, current_type [text], merged_type [ObjectMapper]"
  },
  "status": 400
}

I know what the problem is. If I change the "value" keyword ","params":[{"key":"info","value" in the message to "dummy_value", it parses correctly. That is why I have done the gsub. But the gsub doesn’t work. Well it does work, but it doesn’t fix this issue. The message only parses when I change the value in the message itself.

Has anybody seen this before?

Hey Stu,

there is an inconsistency in your data compared to the way how Elasticsearch handles it.

First, the simulate pipeline API only creates the JSON document, but does not try to index anything, which means not all the checks are run, only the conversion happens.

Let's take a look at the JSON document, specifically the params field

"params" : [
            {
              "key" : "info",
              "dummy_value" : {
                "features" : {
                   ...
                }
              }
            },
            {
              "key" : "token",
              "dummy_value" : "[FILTERED]"
            },
            {
              "key" : "last_update",
              "dummy_value" : "8b749dd0becc849c3ecc2b93bace9261"
            }
          ]

The dummy_value field is a string field in two cases, but an object field in one case. This confuses the indexing component, as it cannot properly assign a mapping due to changing data structures.

You need to make sure, that those fields are always either a string or a complex JSON object (or just not index & make them searchable them at all, then this is fine as well).

--Alex

Thank you Alex.

With the help of support, we tweaked the gusub to only match the object field and not the string field and now it works:-

PUT _ingest/pipeline/gitlab_api_json_pipe
{
  "processors": [
    {
         "gsub": {
           "field": "message",
           "pattern": "value\":\\{",
           "replacement": "dummy_value\":\\{"

         },
         
         "json": {
          "field": "message",
          "add_to_root": true
        }
      },
    {
      "remove": {
        "field": "message"
      }
    }
    
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.