How to update the GeoIP Databases for Elastic Ingest Pipelines

I've read up on the docs and current topics and haven't found how to properly update the GeoIP Databases that come with Elasticsearch for the Ingest Pipelines.

I have downloaded the latest GeoLite2-Country.mmdb from MaxMind, and placed it in the /usr/share/elasticsearch/modules/ingest-geoip/ folder. This is where the documentation starts to get sparse.

Do I need to put this database file on all data nodes where ingest pipelines might be processed? I assume so and tried this.
Do I need to restart the elasticsearch service on said data nodes? I assumed so and tried this as well.

But this IP (45.153.227.50) continues to come up as Russia, even though MaxMind's database lookup shows it as Germany. And so do all of our other geo lookup tools.
https://www.maxmind.com/en/geoip-demo

Am I missing a step somewhere to get Elasicsearch to recognize the newly updated database files?

Hi @nverrill

You are on the right track

1st you need to careful in the future of comparing the MaxMind GeoIP demo with the results from the GeoLite2 database. The demo is based on their commercial offering and the GeoLite2 the free offereing does not have the same accuracy / data I know this from some very long late night frustration.

If you want to test the GeoLite2 database follow the instructions here

That said I see the new data in the new GeoLite2 database

First I recommend to download all 3 GeoLite2 databases and install them the ASN, Country and City.
Yes you need to install on all the nodes. I restarted my node(s). If you are using ingest nodes, the DBs would need to be on them as well.

/Users/sbrown/workspace/elastic-install/7.13.0/elasticsearch-7.13.0/modules/ingest-geoip
ceres-2:ingest-geoip sbrown$ ls -lrt
total 111632
-rw-r--r--  1 sbrown  staff      1081 May 19 15:24 plugin-security.policy
-rw-r--r--  1 sbrown  staff      1747 May 19 15:24 plugin-descriptor.properties
-rw-r--r--  1 sbrown  staff     23384 May 19 15:24 maxmind-db-1.3.1.jar
-rw-r--r--  1 sbrown  staff   1404874 May 19 15:24 jackson-databind-2.10.4.jar
-rw-r--r--  1 sbrown  staff     68083 May 19 15:24 jackson-annotations-2.10.4.jar
-rw-r--r--  1 sbrown  staff     94678 May 19 15:24 ingest-geoip-7.13.0.jar
-rw-r--r--  1 sbrown  staff     49735 May 19 15:24 geoip2-2.13.1.jar
drwxr-xr-x@ 5 sbrown  staff       160 May 25 04:57 GeoLite2-Country_20210525/
drwxr-xr-x@ 6 sbrown  staff       192 May 25 05:03 GeoLite2-City_20210525/
drwxr-xr-x@ 5 sbrown  staff       160 May 27 10:02 GeoLite2-ASN_20210528/
drwxr-xr-x  5 sbrown  staff       160 May 29 08:51 GeoLite2_orig/
-rw-r--r--@ 1 sbrown  staff   4081989 May 29 08:52 GeoLite2-ASN_20210528.tar.gz
-rw-r--r--@ 1 sbrown  staff  31195858 May 29 08:52 GeoLite2-City_20210525.tar.gz
-rw-r--r--@ 1 sbrown  staff   2085704 May 29 08:52 GeoLite2-Country_20210525.tar.gz
-rw-r--r--@ 1 sbrown  staff   7335692 May 29 08:53 GeoLite2-ASN.mmdb
-rw-r--r--@ 1 sbrown  staff  63864684 May 29 08:53 GeoLite2-City.mmdb
-rw-r--r--@ 1 sbrown  staff   4076222 May 29 08:53 GeoLite2-Country.mmdb

I use this little simple test

BEFORE

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip"
      }
    }
  ]
}


POST _ingest/pipeline/geoip/_simulate
{
  "docs": [
    {
      "_source": {
        "ip": "45.153.227.50"
      }
    }
  ]
}

BEFORE RESULT

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "geoip" : {
            "continent_name" : "Europe",
            "country_name" : "Russia",
            "location" : {
              "lon" : 37.6068,
              "lat" : 55.7386
            },
            "country_iso_code" : "RU"
          },
          "ip" : "45.153.227.50"
        },
        "_ingest" : {
          "timestamp" : "2021-05-29T16:07:55.648116576Z"
        }
      }
    }
  ]
}

Now after I installed the new databases and restarted

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "geoip" : {
            "continent_name" : "Europe",
            "region_iso_code" : "DE-BE",
            "city_name" : "Berlin",
            "country_iso_code" : "DE",
            "country_name" : "Germany",
            "region_name" : "Land Berlin",
            "location" : {
              "lon" : 13.4059,
              "lat" : 52.5155
            }
          },
          "ip" : "45.153.227.50"
        },
        "_ingest" : {
          "timestamp" : "2021-05-29T16:07:44.764339Z"
        }
      }
    }
  ]
}

Just to Check I put the old databases back in and I got the old Geo Location.

You could test this on a single node Elasticsearch cluster to check you have it right first and then do it on all your node.

Hope this helps.

1 Like

Thanks for the quick response @stephenb .

I spun up a test instance like yours and, like you, it worked.
Working my way back to my production cluster to see what went wrong, I realized something.

When I perform a pipeline/_simulate in the Kibana Dev console, it performs that simulation on the elasticsearch node it is currently connected to, in my case that was a dedicated master node, which I had not updated the GeoIP DB for, because it doesn't ingest anything.

As soon as I realized that, I started using curl commands directed at a specific node, to test whether the node itself is updated and very easily showed that the data nodes were successfully updated and the master node was not.
curl -XPOST "https://data001.domain.com:9200/_ingest/pipeline/geoip-test-pipeline/_simulate" -H 'Content-Type: application/json' -d'{"docs":[{"_index":"test","_source":{"source":{"ip":"79.116.78.121"}}}]}'

Hmmm, just realizing that I also had trouble putting a geoIP pipeline into Kibana console, when specifying a custom database file. It would give an error about not finding the custom database file. Since the pipeline is PUT for the whole cluster, I'd have to guess that ALL nodes must have the custom database file, including master-only nodes in order for the pipeline to successfully PUT. Everything is making so much sense now...

Glad you got everything going.

Technically I do not think the GEOIP DB is required on master only node IF you make sure no document index request including your Kibana are ever routed to a the master only node. (If not it is probably safe to install on the master only as well)

With that in mind when you actually are ingesting data at scale, for best practices those index operations should be pointed at a Data or Ingest node not a master only node.

Make sense?

True, I had not considered pointing Kibana at a data node instead of the master-only node. Certainly all index operations do point to the data nodes only, but out of habit I have always kept Kibana pointed at the master's only.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.