Issue with ingest pipeline

I have a data entry through Logstash (8.9.1) that does not have a field (hostname) only ip.

I created a pipeline using the Elastic API (8.9.1) like this:

PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "script": {
        "source": """
          if (ctx.ip == '192.168.203.1') {
            ctx.hostname = 'SWITCH-CORE';
          }
        """,
        "lang": "painless"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

When executed it returns true.

When simulating:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "loss": 0,
          "hostname": "SWITCH-CORE",
          "@timestamp": "2023-10-06T09:30:52-03:00",
          "jitter": 0,
          "ip": "192.168.203.1",
          "latency": 2,
          "error": ""
        },
        "_ingest": {
          "timestamp": "2023-10-06T15:19:01.551366204Z"
        }
      }
    }
  ]
}

Apparently the hostname field was created successfully.
However, when checking when entering new data, the field is always empty.

I don't know which part I missed something.

Hi @rafaelrangel

try

host.hostname

That is technically the right field to use with ECS (Elastic Common Schema)

Can you provide a sample doc? You only showed the results

You can also try to actually post the doc to the index

POST my-index/_doc/?pipeline=name_device_ping
{
  ...
}

Let us know what you see...

Hi @stephenb ,

I did it host.hostname


PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "script": {
        "source": """
          if (ctx.ip == '192.168.203.1') {
            ctx.host.hostname = 'SWITCH-CORE';
          }
        """,
        "lang": "painless"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}
POST ping_data-2023.10.05/_doc/pipeline=name_device_ping
{
  "ip": "192.168.203.1",
  "jitter": 5,
  "latencia": 10
}


{
  "_index": "ping_data-2023.10.05",
  "_id": "pipeline=name_device_ping",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 27651,
  "_primary_term": 1
}

But hostname field is empty yet.

My doc sample:

{
  "_index": "ping_data-2023.10.07",
  "_id": "YGzyCosBzfIl7KF9J9_c",
  "_version": 1,
  "_score": 0,
  "_source": {
    "ip": "192.168.203.1",
    "latency": 2,
    "error": "",
    "@timestamp": "2023-10-07T13:22:03-03:00",
    "jitter": 0,
    "loss": 0
  },
  "fields": {
    "loss": [
      0
    ],
    "error.keyword": [
      ""
    ],
    "@timestamp": [
      "2023-10-07T16:22:03.000Z"
    ],
    "jitter": [
      0
    ],
    "ip": [
      "192.168.203.1"
    ],
    "latency": [
      2
    ],
    "ip.keyword": [
      "192.168.203.1"
    ],
    "error": [
      ""
    ]
  }
}


Hi @rafaelrangel

When I do this

PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "script": {
        "source": """
          if (ctx.ip == '192.168.203.1') {
            ctx.host.hostname = 'SWITCH-CORE';
          }
        """,
        "lang": "painless"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

POST _ingest/pipeline/name_device_ping/_simulate
{
  "docs": [
    {
      "_source": {
        "ip": "192.168.203.1",
        "jitter": 5,
        "latencia": 10
      }
    }
  ]
}

I get this

{
  "docs" : [
    {
      "doc" : {
        "_index" : "failed-_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "jitter" : 5,
          "latencia" : 10,
          "meta" : {
            "errors" : [
              "cannot access method/field [hostname] from a null def reference, script, "
            ]
          },
          "ip" : "192.168.203.1"
        },
        "_ingest" : {
          "timestamp" : "2023-10-07T16:46:05.587054251Z"
        }
      }
    }
  ]
}

I think it is much simpler and more efficient to do this
(I use script processor as a last resort)

PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "set": {
        "if": "ctx.ip == '192.168.203.1'", 
        "field": "host.hostname",
        "value": "SWITCH-CORE"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

POST _ingest/pipeline/name_device_ping/_simulate
{
  "docs": [
    {
      "_source": {
        "ip": "192.168.203.1",
        "jitter": 5,
        "latencia": 10
      }
    }
  ]
}

and Get this

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "jitter" : 5,
          "latencia" : 10,
          "ip" : "192.168.203.1",
          "host" : {
            "hostname" : "SWITCH-CORE"
          }
        },
        "_ingest" : {
          "timestamp" : "2023-10-07T16:47:17.452088775Z"
        }
      }
    }
  ]
}

BUT if you have a list of IPs you want to map values the MUCH better way to do this is with an Enrich Processor

Hi @stephenb

I did exactly as you said, however the SWITCH-CORE value is not populating the field.

Please show each step pipeline document, mapping, results etc.

You can also use verbose=true to debug _simulate

Also, did it not work on simulate or did it not work when you actually tried to write a document to an index?.

@stephenb ,

This is an example from my document:

{
  "_index": "ping_data-2023.10.09",
  "_id": "AnLcFIsBzfIl7KF9a0EO",
  "_version": 1,
  "_score": 0,
  "_source": {
    "ip": "192.168.203.5",
    "latency": 1,
    "error": "",
    "@timestamp": "2023-10-09T11:34:31-03:00",
    "jitter": 0,
    "loss": 0
  },
  "fields": {
    "loss": [
      0
    ],
    "error.keyword": [
      ""
    ],
    "@timestamp": [
      "2023-10-09T14:34:31.000Z"
    ],
    "jitter": [
      0
    ],
    "ip": [
      "192.168.203.5"
    ],
    "latency": [
      1
    ],
    "ip.keyword": [
      "192.168.203.5"
    ],
    "error": [
      ""
    ]
  }
}

Pipeline:

PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "set": {
        "if": "ctx.ip == '192.168.203.1'", 
        "field": "host.hostname",
        "value": "SWITCH-CORE"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

Output:

{
  "acknowledged": true
}

Simulate:

POST _ingest/pipeline/name_device_ping/_simulate
{
  "docs": [
    {
      "_source": {
        "ip": "192.168.203.1",
        "jitter": 5,
        "latencia": 10
      }
    }
  ]
}

Output:

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "host": {
            "hostname": "SWITCH-CORE"
          },
          "jitter": 5,
          "latencia": 10,
          "ip": "192.168.203.1"
        },
        "_ingest": {
          "timestamp": "2023-10-09T14:37:12.050424333Z"
        }
      }
    }
  ]
}

Aplying pipeline to index:

POST ping_data-2023.10.05/_doc/pipeline=name_device_ping
{
  "ip": "192.168.203.1",
  "jitter": 5,
  "latencia": 10
}

Output:

{
  "_index": "ping_data-2023.10.05",
  "_id": "pipeline=name_device_ping",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

In the simulation it works. However, when new indexes are created, the hostname field does not populate. Shows "No field data for the current search."

Please format you code going forward .... Open you previous post and see what I did 3 Back ticket before and after each block

{
  "_index": "ping_data-2023.10.05",
  "_id": "pipeline=name_device_ping", <!---- NOT CORRECT 
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

Wrong syntax

POST ping_data-2023.10.05/_doc/?pipeline=name_device_ping
...............................^
the ? query parameter indicator is missing there for it used what was after the doc as the _id

Try again.

@stephenb ,

I´ve corrected now:

POST ping_data-2023.10.05/_doc/?pipeline=name_device_ping
{
"docs": [
{
"_source": {
"ip": "192.168.203.1",
"jitter": 5,
"latencia": 10
}
}
]
}

Output:

{
"_index": "ping_data-2023.10.05",
"_id": "DXInFYsBzfIl7KF9_2qu",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}

image

@rafaelrangel

again ... not formatted code. This will be my last response

I am taking time to answer ... perhaps you could take time (about 5 seconds) to format your code... as a courtesy and ease of reading

What is this result?

GET ping_data-2023.10.05/_doc/DXInFYsBzfIl7KF9_2qu

This will show the mapping
GET ping_data-2023.10.05

Or

GET ping_data-2023.10.05/_search
{
  "fields": ["*"]
}

@stephenb ,

Sorry, I'm new here and only now did I realize what you said regarding text formatting.
It will not happen again.
The new requested outputs follow:

GET ping_data-2023.10.05/_doc/DXInFYsBzfIl7KF9_2qu
{
  "_index": "ping_data-2023.10.05",
  "_id": "DXInFYsBzfIl7KF9_2qu",
  "_version": 1,
  "_seq_no": 4,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "docs": [
      {
        "_source": {
          "jitter": 5,
          "latencia": 10,
          "ip": "192.168.203.1"
        }
      }
    ]
  }
}

See I could not see that without the formatting :wink: I would have caught that right away! :slight_smile:

Correct should be

POST ping_data-2023.10.05/_doc/?pipeline=name_device_ping
{
  "ip": "192.168.203.1",
  "jitter": 5, 
  "latencia": 10
}

The _simulate takes an array of docs but when you post an actual doc to an index it is just the JSON content of the document

@stephenb ,

Thank you for the patience.

I did it as below:

POST ping_data-2023.10.05/_doc/?pipeline=name_device_ping
{
  "ip": "192.168.203.1",
  "jitter": 5, 
  "latencia": 10
}
{
  "_index": "ping_data-2023.10.05",
  "_id": "F3KJFYsBzfIl7KF9oZ_j",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 20,
  "_primary_term": 1
}

See my mapping:

GET ping_data-2023.10.05/_mapping

{
  "ping_data-2023.10.05": {
    "mappings": {
      "properties": {
        "docs": {
          "properties": {
            "_source": {
              "properties": {
                "ip": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                },
                "jitter": {
                  "type": "long"
                },
                "latencia": {
                  "type": "long"
                }
              }
            }
          }
        },
        "host": {
          "properties": {
            "hostname": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "ip": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "jitter": {
          "type": "long"
        },
        "latencia": {
          "type": "long"
        }
      }
    }
  }
}

But I can´t see the value at hostname field:

image

And what does

GET ping_data-2023.10.05/_doc/F3KJFYsBzfIl7KF9oZ_j

And your mapping is the Default Provided by Elastic i.e. you did not create one, nor did you use one of the OOTB.

I think @rafaelrangel perhaps need to back up and learn a few basics about fields, mappings (schema)

You need to understand the difference between text (full search) and keyword exact match (which is probably what you want)

And you really want to create your own mapping BEFORE you index documents.
You will actually want to create a template

Try Creating the Mapping first to see the Value how it works
You will need to delete your old index first

PUT ping_data-2023.10.05/
{
  "mappings": {
    "properties": {
      "host": {
        "properties": {
          "hostname": {
            "type": "keyword"
          }
        }
      },
      "ip": {
        "type": "ip"
      },
      "jitter": {
        "type": "long"
      },
      "latencia": {
        "type": "long"
      }
    }
  }
}

You have no timestamp so you will not really see this correctly in Discover or Vizualizations etc..etc..

I really don't know what the simplest solution would be to resolve this issue.
My only need is that when certain IPs are identified, the hostname field is created according to the following:

PUT _ingest/pipeline/name_device_ping
{
  "processors": [
    {
      "set": {
        "if": "ctx.ip == '192.168.203.1'", 
        "field": "host.hostname",
        "value": "SWITCH-CORE"
      }
    },
    {
      "set": {
        "if": "ctx.ip == '192.168.203.7'", 
        "field": "host.hostname",
        "value": "NVR-OPERACAO"
      }
    },
    {
      "set": {
        "if": "ctx.ip == '192.168.203.6'", 
        "field": "host.hostname",
        "value": "CM1-CB250"
      }
    }
  ],
  "on_failure": [
    {
      "append": {
        "field": "meta.errors",
        "value": "{{ _ingest.on_failure_message }}, {{ _ingest.on_failure_processor_type }}, {{ _ingest.on_failure_processor_tag }}"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "failed-{{ _index }}"
      }
    }
  ]
}

I thought just creating this pipeline would be enough.

You mentioned that you are using Logstash, what does your Logstash configuration looks like? Are you setting the pipeline option in your elasticsearch output on Logstash?

Also, any reason to not do this enrichment on the Logstash side?

For example, to do that on Logstash you just need the following filter:

filter {
	translate {
		source => "ip"
		target => "hostname"
		dictionary => {
			"192.168.203.1" => "SWITCH-CORE"
			"192.168.203.7" => "NVR-OPERACAO"
			"192.168.203.6" => "CM1-CB250"
		}
	}
}
1 Like

@rafaelrangel

Even if you use @leandrojmp 's excellent suggestion, you should really learn about field types, mappings and templates ... or use some of the out-of-the-box approaches.

Although I would suggest the target be "[host][hostname]" to be ECS compliant

Thanks @stephenb and @leandrojmp

I actually don't have access to the logstash configuration files. Therefore, I was trying to solve it directly through the Elastic API. I'll try to check with whoever has this access to do it using a dictionary directly in the Logstash filter.

To use an ingest pipeline in Elasticsearch with data coming from logstash, you need to edit the Logstash configuration and tell it to use the ingest pipeline in the request, if the logstash configuration does not have the pipeline => name_device_ping in the elasticsearch output, it will not use the ingest pipeline.

Another alternative that you can try is to change some setting in your index to tell Elasticsearch to always run some specific Ingest Pipeline.

For example, the following request will add the index.final_pipeline to the settings of your index, this will force elasticsearch to run this pipeline before indexing the data.

PUT /your-index-name
{
  "settings": {
    "index.final_pipeline": "name_device_ping"
  }
}

If you are using daily indices you would need to add this setting to your template so new indices get it automatically.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.