Data getting SWAPPED in Elasticsearch with pipeline

tusharnemade · August 28, 2023, 10:09am

Hello Team

We are using Elasticsearch version 7.8.0

We are having Index with Pipeline Defined ..

Our Problem is data is getting SWAPPED between two fields of Elasticsearch.

Data of "OBJ_NAM_FILDT" is getting posted in "USER_TAGS_AK" and vice-versa. That too only for FEW Records and not for all the data records being pushed to Elasticsearch.

Could you please help us if you are aware of such behaviour in Elasticsearch with Pipeline "PROCESSOR - SPLIT " define on a field.

Field Definition :

        "OBJ_NAM_FILDT" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },

        "USER_TAGS_AK" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          },
          "analyzer" : "autocomplete"
        },

Index Pipeline :

  "tf_usr_tags_smry_array_tags" : {
    "processors" : [
      {
        "split" : {
          "if" : "ctx.USER_TAGS_AK != null ",
          "field" : "USER_TAGS_AK",
          "separator" : ",",
          "ignore_failure" : true
        }
      }
    ]
  }

Also we have tokenizer define as below in our index settings ..

    "settings" : {
      "index" : {
        "default_pipeline" : "tf_usr_tags_smry_array_tags",
        "creation_date" : "1626846772008",
        "analysis" : {
          "analyzer" : {
            "autocomplete" : {
              "filter" : [
                "lowercase"
              ],
              "tokenizer" : "alz_tkn_tej_usr_tags_sum_9_a3"
            }
          },
          "tokenizer" : {
            "alz_tkn_tej_usr_tags_sum_9_a3" : {
              "punctuation" : {
                "pattern" : "[-]",
                "type" : "pattern"
              },
              "token_chars" : [ ],
              "min_gram" : "1",
              "side" : "front",
              "type" : "edge_ngram",
              "max_gram" : "10"
            }
          }
        }

tusharnemade · August 29, 2023, 4:04am

tusharnemade:

"tf_usr_tags_smry_array_tags" : {
    "processors" : [
      {
        "split" : {
          "if" : "ctx.USER_TAGS_AK != null ",
          "field" : "USER_TAGS_AK",
          "separator" : ",",
          "ignore_failure" : true
        }
      }
    ]
  }

It has come to notice that , when USER_TAGS_AK is in actually NULL / empty , behaviour of SWAPPING data is seen.

Could you please someone confirm , this is BUG or something with condition of PROCESSOR which need to be changed.

tusharnemade · August 29, 2023, 5:33am

I have now updated the SPLIT processor code with ON_FAILURE Condition.

Will update in sometime , if this has worked for the data or not.

"processors" : [
    {
      "split" : {
        "if" : "ctx.USER_TAGS_AK != null ",
        "field": "USER_TAGS_AK",
        "separator" : ",",
        "on_failure" : [
          {
            "set" : {
              "field" : "USER_TAGS_AK",
              "value" : "'NULL'"
            }
          }
        ]
      }
    }
  ]

system · September 26, 2023, 5:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.