How to update the GeoIP Databases for Elastic Ingest Pipelines

nverrill · May 27, 2021, 7:45pm

I've read up on the docs and current topics and haven't found how to properly update the GeoIP Databases that come with Elasticsearch for the Ingest Pipelines.

I have downloaded the latest GeoLite2-Country.mmdb from MaxMind, and placed it in the /usr/share/elasticsearch/modules/ingest-geoip/ folder. This is where the documentation starts to get sparse.

Do I need to put this database file on all data nodes where ingest pipelines might be processed? I assume so and tried this.
Do I need to restart the elasticsearch service on said data nodes? I assumed so and tried this as well.

But this IP (45.153.227.50) continues to come up as Russia, even though MaxMind's database lookup shows it as Germany. And so do all of our other geo lookup tools.
https://www.maxmind.com/en/geoip-demo

Am I missing a step somewhere to get Elasicsearch to recognize the newly updated database files?

stephenb · May 29, 2021, 4:16pm

Hi @nverrill

You are on the right track

1st you need to careful in the future of comparing the MaxMind GeoIP demo with the results from the GeoLite2 database. The demo is based on their commercial offering and the GeoLite2 the free offereing does not have the same accuracy / data I know this from some very long late night frustration.

If you want to test the GeoLite2 database follow the instructions here

That said I see the new data in the new GeoLite2 database

First I recommend to download all 3 GeoLite2 databases and install them the ASN, Country and City.
Yes you need to install on all the nodes. I restarted my node(s). If you are using ingest nodes, the DBs would need to be on them as well.

/Users/sbrown/workspace/elastic-install/7.13.0/elasticsearch-7.13.0/modules/ingest-geoip
ceres-2:ingest-geoip sbrown$ ls -lrt
total 111632
-rw-r--r--  1 sbrown  staff      1081 May 19 15:24 plugin-security.policy
-rw-r--r--  1 sbrown  staff      1747 May 19 15:24 plugin-descriptor.properties
-rw-r--r--  1 sbrown  staff     23384 May 19 15:24 maxmind-db-1.3.1.jar
-rw-r--r--  1 sbrown  staff   1404874 May 19 15:24 jackson-databind-2.10.4.jar
-rw-r--r--  1 sbrown  staff     68083 May 19 15:24 jackson-annotations-2.10.4.jar
-rw-r--r--  1 sbrown  staff     94678 May 19 15:24 ingest-geoip-7.13.0.jar
-rw-r--r--  1 sbrown  staff     49735 May 19 15:24 geoip2-2.13.1.jar
drwxr-xr-x@ 5 sbrown  staff       160 May 25 04:57 GeoLite2-Country_20210525/
drwxr-xr-x@ 6 sbrown  staff       192 May 25 05:03 GeoLite2-City_20210525/
drwxr-xr-x@ 5 sbrown  staff       160 May 27 10:02 GeoLite2-ASN_20210528/
drwxr-xr-x  5 sbrown  staff       160 May 29 08:51 GeoLite2_orig/
-rw-r--r--@ 1 sbrown  staff   4081989 May 29 08:52 GeoLite2-ASN_20210528.tar.gz
-rw-r--r--@ 1 sbrown  staff  31195858 May 29 08:52 GeoLite2-City_20210525.tar.gz
-rw-r--r--@ 1 sbrown  staff   2085704 May 29 08:52 GeoLite2-Country_20210525.tar.gz
-rw-r--r--@ 1 sbrown  staff   7335692 May 29 08:53 GeoLite2-ASN.mmdb
-rw-r--r--@ 1 sbrown  staff  63864684 May 29 08:53 GeoLite2-City.mmdb
-rw-r--r--@ 1 sbrown  staff   4076222 May 29 08:53 GeoLite2-Country.mmdb

I use this little simple test

BEFORE

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip"
      }
    }
  ]
}


POST _ingest/pipeline/geoip/_simulate
{
  "docs": [
    {
      "_source": {
        "ip": "45.153.227.50"
      }
    }
  ]
}

BEFORE RESULT

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "geoip" : {
            "continent_name" : "Europe",
            "country_name" : "Russia",
            "location" : {
              "lon" : 37.6068,
              "lat" : 55.7386
            },
            "country_iso_code" : "RU"
          },
          "ip" : "45.153.227.50"
        },
        "_ingest" : {
          "timestamp" : "2021-05-29T16:07:55.648116576Z"
        }
      }
    }
  ]
}

Now after I installed the new databases and restarted

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "geoip" : {
            "continent_name" : "Europe",
            "region_iso_code" : "DE-BE",
            "city_name" : "Berlin",
            "country_iso_code" : "DE",
            "country_name" : "Germany",
            "region_name" : "Land Berlin",
            "location" : {
              "lon" : 13.4059,
              "lat" : 52.5155
            }
          },
          "ip" : "45.153.227.50"
        },
        "_ingest" : {
          "timestamp" : "2021-05-29T16:07:44.764339Z"
        }
      }
    }
  ]
}

Just to Check I put the old databases back in and I got the old Geo Location.

You could test this on a single node Elasticsearch cluster to check you have it right first and then do it on all your node.

Hope this helps.

nverrill · June 1, 2021, 12:18pm

Thanks for the quick response @stephenb .

I spun up a test instance like yours and, like you, it worked.
Working my way back to my production cluster to see what went wrong, I realized something.

When I perform a pipeline/_simulate in the Kibana Dev console, it performs that simulation on the elasticsearch node it is currently connected to, in my case that was a dedicated master node, which I had not updated the GeoIP DB for, because it doesn't ingest anything.

As soon as I realized that, I started using curl commands directed at a specific node, to test whether the node itself is updated and very easily showed that the data nodes were successfully updated and the master node was not.
curl -XPOST "https://data001.domain.com:9200/_ingest/pipeline/geoip-test-pipeline/_simulate" -H 'Content-Type: application/json' -d'{"docs":[{"_index":"test","_source":{"source":{"ip":"79.116.78.121"}}}]}'

Hmmm, just realizing that I also had trouble putting a geoIP pipeline into Kibana console, when specifying a custom database file. It would give an error about not finding the custom database file. Since the pipeline is PUT for the whole cluster, I'd have to guess that ALL nodes must have the custom database file, including master-only nodes in order for the pipeline to successfully PUT. Everything is making so much sense now...

stephenb · June 1, 2021, 1:34pm

Glad you got everything going.

Technically I do not think the GEOIP DB is required on master only node IF you make sure no document index request including your Kibana are ever routed to a the master only node. (If not it is probably safe to install on the master only as well)

With that in mind when you actually are ingesting data at scale, for best practices those index operations should be pointed at a Data or Ingest node not a master only node.

Make sense?

nverrill · June 1, 2021, 2:05pm

True, I had not considered pointing Kibana at a data node instead of the master-only node. Certainly all index operations do point to the data nodes only, but out of habit I have always kept Kibana pointed at the master's only.

system · June 29, 2021, 2:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New GeoIP ingest processor is missing databases, but existing GeoIP processors work fine Elasticsearch ingest-pipeline	15	1420	January 3, 2023
GEOIP Database Update Issue: Documentation Followed, Databases Not Updating on Ingest Nodes Elasticsearch ingest-pipeline	19	631	February 27, 2024
Problems using ingest-geoip in elasticsearch Elasticsearch	1	616	September 9, 2020
How to update the geoip with the new? Elasticsearch	17	5148	April 17, 2018
Cannot use database_file in ingest-geoip, please send help every thread here is a dead end Elasticsearch ingest-pipeline	3	465	December 9, 2021

How to update the GeoIP Databases for Elastic Ingest Pipelines

Related topics