Convert two field string into a single field geo_point

it is occuring with every document that are entering the elasticsearch input buffer and goes though the pipeline. Without pipeline, all the parsed field are named correctly inside the indice. I also can visualize them correctly on Kibana.
There is an exemple of the regex i'm using if you want to see it : https://regex101.com/r/uDHpsP/1/

Maybe there is some sort of issue when the message are send by fluent-bit to elasticsearch, this looks weird.
I also setup a "debug" regex that send the entire message without parsing it to elasticsearch so i know the entire message is sent to elasticsearch without any loss of data between the 2 services

So can you update the same reproduction script with a typical document that is stored in elasticsearch without going through the pipeline?

Processing the message though the console into the index with the pipeline works well :

POST "my live indices"/flb_type?pipeline=geoip
{
  "@timestamp":"2019-03-25T12:10:30Z",
  "IP_Source" : "8.8.8.8",
  "IP_Destination" : "12.34.56.78",
  "Description":"Message de test (:"
}

I got the expected result :
image

Seems like the problem come from when the entries comming from my parser are processed.
I can also specified the field type of specific parsed field before they are send to elasticsearch but i don't think it change anything because the field "geolocalisation" is created in the pipeline.

If, for exemple, add a geo_point string (0,0 for an exemple) in the field "geolocalisation", it's working properly :

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "IP_Source",
        "target_field" : "location",
        "ignore_failure" : true,
        "properties" : ["location","city_name"]
      },
      "set" : {
        "field" : "geolocalisation",
        "value" : "0,0"
      }
    }
  ]
}

The geo_point understand correctly the value and is not throwing any errors.
It's happening ONLY when the incoming messages are processed by the "set" processor. If i setup a random value as above, it's working correctly, even if i add it to incoming messages

I'm running out of ideas. I don't think I have a clear vision of what is actually being sent to elasticsearch, ie. a JSON document that is sent from your Fluent-Bit to elasticsearch.

If you can provide one untouched one, I can have a look.

Of course, there is an Untouched Json generated by fluent-bit:

{"ID_Firewall":"MY_FIREWALL", "timestamp":"2019-03-25 15:14:04", "IP_Firewall":"1.1.1.1", "Niveau":"7", "MAC_Source":"aa:bb:cc:dd:ee:ff", "IP_Source":"2.2.2.2", "Port_Source":"51348", "INT_Source":"X1654", "Zone_Source":"WAN", "NAT_Source":"3.3.3.3", "NAT_Port_Source":"51348", "MAC_Destination":"gg:hh:ii:jj:kk:ll", "IP_Destination":"4.4.4.4", "Port_Destination":"443", "INT_Destination":"X0", "Zone_Destination":"LAN", "NAT_Destination":"5.5.5.5", "NAT_Port_Destination":"443", "Protocole":"tcp/https", "Regle":"(WAN->LAN)", "Note":"TCP Flag(s): ACK RST"}

So I can't reproduce the problem with:

DELETE _template/test_geoip
PUT _template/test_geoip
{
  "index_patterns": "test_geoip", 
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1
  },
  "mappings": {
    "flb_type": {
	  	"properties" : {
          "geolocalisation" : {
            "type" : "geo_point"
        }
      }
    }
  }
}

DELETE _ingest/pipeline/test_geoip_pipeline
PUT _ingest/pipeline/test_geoip_pipeline
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "IP_Source",
        "ignore_failure" : true
      }
    },
    {
      "set" : {
        "field" : "geolocalisation",
        "value" : "{{geoip.location.lat}},{{geoip.location.lon}}"
      }
    }
  ]
}

DELETE test_geoip
PUT test_geoip/flb_type/1?pipeline=test_geoip_pipeline
{
  "ID_Firewall": "MY_FIREWALL",
  "timestamp": "2019-03-25 15:14:04",
  "IP_Firewall": "1.1.1.1",
  "Niveau": "7",
  "MAC_Source": "aa:bb:cc:dd:ee:ff",
  "IP_Source": "2.2.2.2",
  "Port_Source": "51348",
  "INT_Source": "X1654",
  "Zone_Source": "WAN",
  "NAT_Source": "3.3.3.3",
  "NAT_Port_Source": "51348",
  "MAC_Destination": "gg:hh:ii:jj:kk:ll",
  "IP_Destination": "4.4.4.4",
  "Port_Destination": "443",
  "INT_Destination": "X0",
  "Zone_Destination": "LAN",
  "NAT_Destination": "5.5.5.5",
  "NAT_Port_Destination": "443",
  "Protocole": "tcp/https",
  "Regle": "(WAN->LAN)",
  "Note": "TCP Flag(s): ACK RST"
}

GET test_geoip/flb_type/1

Yes, weirdly it's happening only when the live entries are processed.

  • If i setup a manual coordinates in the pipeline, it's working
  • If we simulate a full message though the simulate API, it's working
  • If the pipeline process the messages live, it's not working
  • If i don't setup the "set" processor in the pipeline, the ingest-geoip is working properly but later i can't use it for geohash purpose

I haven't tried to setup a new live indice unstead of an existing indice. I'm testing this right now

EDIT : I still got the same error with a fresh new template/index as above. And like as above, i can put manual message though the pipeline and it's working properly ... This is not making any sense :thinking:

I'm going to open an issue on Github later i think

Please don't do this until we know there is an issue.
Here I can't see any issue so far.

More likely a misusage.

It means to me that the rejected document is not exactly the document you shared.
I don't know fluentd, but may be you can debug what is sent to elasticsearch, how it called, etc...

I see what you are saying here. But i don't really know how to exactly see the incoming messages into elasticsearch.
I'm going to try setup a wireshark in the local loop to see the messages.

EDIT : There is a RAW capture from my server. This is captured in the local loop (i'm just pasting 1 line of the send and the answer) :

POST /_bulk/?pipeline=geoip HTTP/1.1
Host: 9.8.7.6:9200
Content-Length: 18063
User-Agent: Fluent-Bit
Content-Type: application/x-ndjson

{"index":{"_index":"test-2019-03-25","_type":"flb_type"}}
{"@timestamp":"2019-03-25T15:14:14.038Z", "ID_Firewall":"MY-FIREWALL", "timestamp":"2019-03-25 16:14:14", "IP_Firewall":"1.1.1.1", "Niveau":"6", "Description":"Connection Opened", "IP_Source":"2.2.2.2", "Port_Source":"45724", "INT_Source":"X1", "NAT_Source":"3.3.3.3", "NAT_Port_Source":"45724", "IP_Destination":"4.4.4.4", "Port_Destination":"443", "INT_Destination":"X0", "NAT_Destination":"5.5.5.5", "NAT_Port_Destination":"443", "Protocole":"tcp/https"}

HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 6901

{"took":41,"ingest_took":46,"errors":false,"items":[{"index":{"_index":"test-2019-03-25","_type":"flb_type","_id":"rFxqtWkBJGtfi-AKlbyT","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":41887,"_primary_term":1,"status":201}},{"index":{"_index":"test-2019-03-25","_type":"flb_type","_id":"rVxqtWkBJGtfi-AKlbyT","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":42084,"_primary_term":1,"status":201}}, ... 

I also test this with the pipeline configured and i still got the exact same error here.

{"index":{"_index":"test-2019-03-25","_type":"flb_type","_id":"X2V5tWkBJGtfi-AKnY0l","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse field [geolocalisation] of type [geo_point]","caused_by":{"type":"array_index_out_of_bounds_exception","reason":"0"}}}},

Sometimes the value is even to big for the geo_point field ! :

,{"index":{"_index":"test-2019-03-25","_type":"flb_type","_id":"dmV5tWkBJGtfi-AKnY0l","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse field [geolocalisation] of type [geo_point]","caused_by":{"type":"illegal_argument_exception","reason":"illegal latitude value [-122.3321] for geolocalisation"}}}},

Does this mean the processor is working but can't write correctly the field with the "set" processor on incoming messages from another service than Elasticsearch himself ?
It's like it can't find the "lat" and "lon" value when it's processed this way

1 Like

Do you have any update ?

I tried your last example:

DELETE _template/test_geoip
PUT _template/test_geoip
{
  "index_patterns": "test_geoip", 
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1
  },
  "mappings": {
    "flb_type": {
	  	"properties" : {
          "geolocalisation" : {
            "type" : "geo_point"
        }
      }
    }
  }
}

DELETE _ingest/pipeline/test_geoip_pipeline
PUT _ingest/pipeline/test_geoip_pipeline
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "IP_Source",
        "ignore_failure" : true
      }
    },
    {
      "set" : {
        "field" : "geolocalisation",
        "value" : "{{geoip.location.lat}},{{geoip.location.lon}}"
      }
    }
  ]
}

DELETE test_geoip
PUT test_geoip/flb_type/2?pipeline=test_geoip_pipeline
{
  "@timestamp": "2019-03-25T15:14:14.038Z",
  "ID_Firewall": "MY-FIREWALL",
  "timestamp": "2019-03-25 16:14:14",
  "IP_Firewall": "1.1.1.1",
  "Niveau": "6",
  "Description": "Connection Opened",
  "IP_Source": "2.2.2.2",
  "Port_Source": "45724",
  "INT_Source": "X1",
  "NAT_Source": "3.3.3.3",
  "NAT_Port_Source": "45724",
  "IP_Destination": "4.4.4.4",
  "Port_Destination": "443",
  "INT_Destination": "X0",
  "NAT_Destination": "5.5.5.5",
  "NAT_Port_Destination": "443",
  "Protocole": "tcp/https"
}

GET test_geoip/flb_type/2

It works for me.

Unless you can share an example which is failing, I'm not sure I can help.

Well it is failing only when the entry is processed by the ingest-geoip receive a message from another service than elasticsearch himself. If i provide any other exemple different from what i post above, it will be the same issue again, and again, and again ... (I capture more conversation between the 2 services and i always got the same error for no reason)
I'm also out of solution on my side and i can't afford reinstalling my node without loosing all my indexes. I can't also deploy another node and re-index every data on it :confused: I don't have access to the necessary ressources to reinstall a node on my own. :confused:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.