Adding GeoIP pipeline to APM data

Hi, We're essentially on ES 8.7 and I've interested on adding Geo Location data to our APM data from our Java application.

I did see what I think is an old post that talked about it, but I suspect that a new mechanism was introduced with @custom processors?

What is the recommended mechanism for this? FWIW we're not using RUM.

OK, I actually made some progress myself. I created a pipeline called metrics-apm@custom and had it search for a custom label that I had to add for the user's public IP address. However, then I test the pipeline against a sample document I see a tag on the result of _geoip_database_unavailable_GeoLite2-City.mmdb.

I did the example test here and that all seemed to work just fine.

It looks to me like the processor config is exactly the same, but the sample works, but my custom one does not?

There is a little edge case where the DBs are lazy loaded the first time... check the APM data again.

Is it still saying the same thing?

Otherwise can you share your exact custom pipeline?

And a sample document

So here's my doc:

[
  {
    "_id": "Something",
    "_index": "whatever",
    "_source": {
      "labels": {
        "IP": "100.20.56.87"
      }
    }
  }
]

Processor:

[
  {
    "geoip": {
      "field": "labels.IP",
      "ignore_missing": true
    }
  }
]

Output looks like:

{
  "docs": [
    {
      "doc": {
        "_index": "whatever",
        "_id": "Something",
        "_version": "-3",
        "_source": {
          "labels": {
            "IP": "100.20.56.87"
          },
          "tags": [
            "_geoip_database_unavailable_GeoLite2-City.mmdb"
          ]
        },
        "_ingest": {
          "timestamp": "2023-05-01T20:34:45.576782612Z"
        }
      }
    }
  ]
}

So I did just notice that I get SOME results occasionally when I do the test with the above document. But maybe just 25%

What does "We're essentially on ES 8.7" mean :slight_smile:

There was a change in some behavior...
What type of cluster are you running? Self Managed Node, Elastic Cloud etc? how many nodes?...

Perhaps the GeoIP db settings are not consistent on all the nodes, which can result in intermittent results.

Is that output from the pipeline simulate? How are you testing, hard to tell without the actual commands... or are you doing that in the Ingest Pipeline Tester in Kibana?

Try this from this post

Disable the geoip databases

PUT _cluster/settings
{
  "persistent": {
    "ingest.geoip.downloader.enabled" : false
  }
}

GET _cat/indices/.ge*?v

GET _ingest/geoip/stats

Wait about 2 mins then re-enable

PUT _cluster/settings
{
  "persistent": {
    "ingest.geoip.downloader.enabled" : true
  }
}

GET _cat/indices/.ge*?v

GET _ingest/geoip/stats

Also how are you directing these request... only ingest nodes have the geoip database so you if you direct the request to a data only node (not and ingest) it will fail.

Sorry, 8.6.2 (doesn't round up! :wink: ). Self-managed node.
I am running the test in Kibana.
The stats look good:

{
    "stats": {
        "successful_downloads": 3,
        "failed_downloads": 0,
        "total_download_time": 14420,
        "databases_count": 3,
        "skipped_updates": 0,
        "expired_databases": 0
    },
    "nodes": {
        "m1JBAEpIQASeJp4ktyEaFQ": {
            "databases": [
                {
                    "name": "GeoLite2-Country.mmdb"
                },
                {
                    "name": "GeoLite2-ASN.mmdb"
                },
                {
                    "name": "GeoLite2-City.mmdb"
                }
            ],
            "files_in_temp": [
                "GeoLite2-ASN.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-ASN.mmdb_LICENSE.txt",
                "GeoLite2-City.mmdb_LICENSE.txt",
                "GeoLite2-Country.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-ASN.mmdb",
                "GeoLite2-City.mmdb_COPYRIGHT.txt",
                "GeoLite2-City.mmdb",
                "GeoLite2-City.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-Country.mmdb_LICENSE.txt",
                "GeoLite2-ASN.mmdb_COPYRIGHT.txt",
                "GeoLite2-Country.mmdb",
                "GeoLite2-Country.mmdb_COPYRIGHT.txt",
                "GeoLite2-City.mmdb_README.txt"
            ]
        },
        "yYLvtvJ0Qgav4mzrU4e7sQ": {
            "databases": [
                {
                    "name": "GeoLite2-Country.mmdb"
                },
                {
                    "name": "GeoLite2-ASN.mmdb"
                },
                {
                    "name": "GeoLite2-City.mmdb"
                }
            ],
            "files_in_temp": [
                "GeoLite2-ASN.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-ASN.mmdb_LICENSE.txt",
                "GeoLite2-City.mmdb_LICENSE.txt",
                "GeoLite2-Country.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-ASN.mmdb",
                "GeoLite2-City.mmdb_COPYRIGHT.txt",
                "GeoLite2-City.mmdb",
                "GeoLite2-City.mmdb_elastic-geoip-database-service-agreement-LICENSE.txt",
                "GeoLite2-Country.mmdb_LICENSE.txt",
                "GeoLite2-ASN.mmdb_COPYRIGHT.txt",
                "GeoLite2-Country.mmdb",
                "GeoLite2-Country.mmdb_COPYRIGHT.txt",
                "GeoLite2-City.mmdb_README.txt"
            ]
        }
    }
}

2 Nodes? Exactly? are there more nodes?

"I am running the test in Kibana."

Multiple ways of running the tests ... are you running them in Dev Tools or the Ingest Pipeline Constructor?

Did you try my disable and enable settings the geoip settings above and test again?

Apologies when things are intermittent every detail counts ...

Exactly 2 ingestion nodes.
I am using the Test Pipeline in the pipeline editor in Kibana.
I did disable, and re-enable the setting.

So there are other nodes? Are There data only nodes?

Good ... And the tests now?..... are they still intermittent?

Personally, I would use Dev Tools for Testing using Pipeline Simulate

Is Kibana pointed to an Ingest Node?

Is the APM Server only pointed ONLY to the ingestion nodes?

I suspect it's possible you are directing requests to data only nodes ... I would think the coordinator part should re-direct but for some reason... I am not completely clear on that ... and don't have an easy way to test right now...

If the behavior is still inconsistent I would check that all the requests that need GeoIP are being directed to the ingest nodes.

Thanks. Maybe I should (for now) ignore my testing strategy.
I did create a VERY simple pipeline to add a new field to my APM App data named metrics-apm@custom. But I am not seeing the new field get added to that data.

I have 3 master nodes, 2 data-hot nodes, 2 ingest nodes, 1 APM server

Yup that is the very first thing I do whenever I integrate a new pipeline ... I use a set processor and add a field pipeline_run: true

If you are trying to add the geoip to the APM transactions that is not the correct data stream...

Think you want to look at the following

Data Stream : traces-apm-default

data_stream.dataset: apm
data_stream.namespace: default
data_stream.type: traces

The datastream you are looking at is for APM Metrics (which is aggregated data)
How familiar are you with Sample Rate etc...

Only Sampled Transactions will / should have IP details etc that can be geoip'd

This may help you understand it a bit more

Yea, my pipeline was for metrics-apm and I was looking at the documents in the data_stream.type: metrics and weren't seeing the additional field.

FWIW I did also add a simple pipeline for traces as well, also just appends a new field/value and I am not seeing any evidence that is getting leveraged either.

I am clearly missing something...

OK, I think I got it working. Not sure what the problem was, but restarting my nodes appeared to resync some data. Looks good now. Thanks!

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.