GEOIP Database Update Issue: Documentation Followed, Databases Not Updating on Ingest Nodes

Summary: I followed the steps outlined in the section "Use a custom endpoint" of the documentation, but I encountered some unexpected behavior. The GEOIP databases are not updating on Elasticsearch ingest nodes. I'm not certain if I made a mistake or if there's an issue with the documentation or GEOIP itself.

Problem Description

Expected Behavior

According to the documentation, I expected the GEOIP databases to update successfully on ingest nodes.

Actual Behavior

However, when checking the status with "GET _ingest/geoip/stats," I see that the updates have not occurred:

{
  "stats": {
    "successful_downloads": 0,
    "failed_downloads": 1,
    "total_download_time": 0,
    "databases_count": 0,
    "skipped_updates": 0,
    "expired_databases": 0
  },
  "nodes": {}
}

Node Configuration

I have a total of 14 Elasticsearch nodes (ELK STACK version: 8.11.1):

  • 8 data nodes
  • 3 ingest nodes
  • 3 master nodes

Troubleshooting Steps

I have taken several steps to troubleshoot the issue:

  1. Executed "elasticsearch-geoip" for updating databases, which appeared to complete successfully.

    elasticsearch@elasticsearch-5c64694f74-26mxt:~/bin$ ./elasticsearch-geoip -s /geoip/ -t /geoip/
    Found GeoIP2-City.mmdb, will compress it to GeoIP2-City.tgz
    Adding GeoIP2-City.tgz to overview.json
    overview.json created
    
  2. Checked file permissions on the target folder and confirmed they are appropriate.

    elasticsearch@elasticsearch-5c64694f74-26mxt:/geoip$ ls -ls
    total 207891
    135949 -rw-r--r--. 1 root          root          139210785 Jan 17 16:44 GeoIP2-City.mmdb
    71942  -rw-rw-rw-. 1 elasticsearch elasticsearch  73668587  Jan 17 16:48 GeoIP2-City.tgz
    1 -rw-rw-rw-. 1 elasticsearch elasticsearch       122 Jan 17 16:48 overview.json
    
  3. Made configurations and settings adjustments as described in the documentation, including updating endpoint URLs, restarting Elasticsearch, and confirming settings.

    • All nodes have the settings "ingest.geoip.downloader.endpoint: https://mydomain.com/overview.json" in elasticsearch.yml.
    • Elasticsearch service on all nodes were restarted after adding the endpoint settings.
    GET _cluster/settings?include_defaults
    
    "geoip": {
          "cache_size": "1000",
          "downloader": {
            "eager": {
              "download": "false"
            },
            "enabled": "true",
            "endpoint": "https://mydomain.com/overview.json"
          }
        }
    
    • I changed the pull interval from 3 days to 1 day. The cluster settings confirm.
    "ingest": {
        "geoip": {
          "downloader": {
            "poll": {
              "interval": "1d"
            }
          }
        }
      }  
    
  4. Tested the connection to the endpoint from all nodes, and all of them can retrieve data from the endpoint.

    • Command executed and result from all nodes:
    $ curl -XGET https://mydomain.com/overview.json
    [{
        "name": "GeoIP2-City.tgz",
        "md5_hash": "3fe7b4df652ad2b6679da1f043255fb1",
        "url": "GeoIP2-City.tgz",
        "updated": 1705510085337
    }]
    
  5. Attempted changing the URL part of overview.json to the full URL endpoint, but it did not resolve the issue.

    [{
        "name": "GeoIP2-City.tgz",
        "md5_hash": "3fe7b4df652ad2b6679da1f043255fb1",
        "url": "https://mydomain.com/GeoIP2-City.tgz",
        "updated": 1705510085337
    }]
    
  6. Set the log level to trace following the instructions in this article. Observed many log lines, including:

[2024-01-17T12:47:50,093][TRACE][o.e.i.g.DatabaseNodeService] [ingest_1] Not checking databases because geoip databases index does not exist

  1. Followed instructions to delete the .geoip_databases index from this article, but I cannot see this index in my cluster. I attempted to output all indices with the API call:

GET _cat/indices?format=JSON&bytes=b&expand_wildcards=all

However, there is no .geoip_databases index in the output.

Conclusion

I'm currently stuck and uncertain about the next steps to resolve this issue. I've followed the documentation and conducted troubleshooting steps as outlined, but the problem persists. Any assistance or guidance from the community would be greatly appreciated.

References:

Hi @Zabulon

There is a setting now that makes loading lazy so they don't load until the first call.

ingest.geoip.downloader.eager.download

Set that to true

ingest.geoip.downloader.eager.download
(Dynamic, Boolean) If true, Elasticsearch downloads GeoIP2 databases immediately, regardless of whether a pipeline exists with a geoip processor. If false, Elasticsearch only begins downloading the databases if a pipeline with a geoip processor exists or is added.

That has tripped me up in the past.

Hi @stephenb ,

Thank you for your input.

I did test that settings and I get the same result... Nothing is downloading...

Here is what I did:

PUT _cluster/settings 
{
  "persistent": {
    "ingest.geoip.downloader.eager.download": "true"
  }
    
}

PUT _cluster/settings 
{
  "persistent": {
    "ingest.geoip.downloader.enabled": "false"
  }
    
}

PUT _cluster/settings 
{
  "persistent": {
    "ingest.geoip.downloader.enabled": null
  }
    
}

And then:

GET _ingest/geoip/stats

{
  "stats": {
    "successful_downloads": 0,
    "failed_downloads": 1,
    "total_download_time": 0,
    "databases_count": 0,
    "skipped_updates": 0,
    "expired_databases": 0
  },
  "nodes": {}
}

That turns off the downloading...

Where did you turn it back on?

PUT _cluster/settings 
{
  "persistent": {
    "ingest.geoip.downloader.enabled": true
  }
    
}

You also have a failed download... did you look at the logs... do you have a FW in the way?

Oh you are expecting this to set the default

PUT _cluster/settings 
{
  "persistent": {
    "ingest.geoip.downloader.enabled": null
  }
    
}

it should hmmm

did you add a pipeline?

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip"
      }
    }
  ]
}

Hello @stephenb,

So, I run all in sequence . From false to true to null (even if true and null is the same in final result.).

Yes, I create a pipeline.

Here is what I tried (in sequence)

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "download_database_on_pipeline_creation": true,
        "field" : "ip"
        
      }
    }
  ]
}

PUT my-index-000001/_doc/my_id?pipeline=geoip
{
  "ip": "89.160.20.128"
}

contains of "My_doc":

GET my-index-000001/_doc/my_id

{
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 4,
  "_seq_no": 3,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "ip": "89.160.20.128",
    "tags": [
      "_geoip_database_unavailable_GeoLite2-City.mmdb"
    ]
  }
}

And GEOIP Stats after runnig the previous ones.:

{
  "stats": {
    "successful_downloads": 0,
    "failed_downloads": 1,
    "total_download_time": 0,
    "databases_count": 0,
    "skipped_updates": 0,
    "expired_databases": 0
  },
  "nodes": {}
}

Yes we have firewall. But, I can download the TGZ file from all ingest node using wget or curl command .

[root@ingest_1 ~]# wget https://mydomain.com/GeoIP2-City.tgz
--2024-01-18 14:24:14--  https://mydomain.com/GeoIP2-City.tgz
Resolving mydomain.com... 192.168.91.32
Connecting to mydomain.com|192.168.91.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73668587 (70M) [application/octet-stream]
Saving to: ‘GeoIP2-City.tgz’

GeoIP2-City.tgz                         100%[=============================================================================>]  70.25M  51.3MB/s    in 1.4s

2024-01-18 14:24:15 (51.3 MB/s) - ‘GeoIP2-City.tgz’ saved [73668587/73668587]

You need to look at the elasticsearch logs to see the actual GEOIP download error. There should be details; you can turn up the logging for the GeoIP if needed, but if it is an error, you should see why. This is how you will figure this out.

so it looks like you are self-hosting... you did not mention that :slight_smile:
How did you configure... I suspect there is an issue with that did you follow the steps?
https://mydomain.com/

Use a custom endpoint

You can create a service that mimics the Elastic GeoIP endpoint. You can then get automatic updates from this service.

  1. Download your .mmdb database files from the MaxMind site.
  2. Copy your database files to a single directory.
  3. From your Elasticsearch directory, run:

./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]

  1. Serve the static database files from your directory. For example, you can use Docker to serve the files from an nginx server:

docker run -v my/source/dir:/usr/share/nginx/html:ro nginx

  1. Specify the service’s endpoint URL in the ingest.geoip.downloader.endpoint setting of each node’s elasticsearch.yml file.By default, Elasticsearch checks the endpoint for updates every three days. To use another polling interval, use the cluster update settings API to set ingest.geoip.downloader.poll.interval.

Hi @stephenb ,

I changed level to trace with that command:


PUT /_cluster/settings
{
  "persistent": {
	"logger.org.elasticsearch.ingest.geoip": "trace"
  }
}

The result in the log (as mentioned in the initial post ) is:

[2024-01-17T12:47:50,093][TRACE][o.e.i.g.DatabaseNodeService] [ingest_1] Not checking databases because geoip databases index does not exist

And Yes , I followed that documentation you mentionned.

ahhhhh OK here is one thing.

The databases DO NOT load on ingest-only nodes... as the data is stored in an Index and ingest-only nodes do not have data i.e. do not have indices

GeoIP databases will ONLY load on Data Nodes... because that is where the data lives and thus the geoip index.
Example

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
green  open   .geoip_databases ZYBOAS9_SqiNx5Z3GxYAgQ   1   1         42           42     84.6mb         42.3mb       42.3mb

I know that is confusing but that is how it works...

So the Ingest Pipeline lives on an Ingest node but it calls a data node to do the GEOP Lookup...

I actually have a thread on this as well somewhere I will see if I can find it...

So, you need to enable the loading of the databases on the data nodes...

Or on the cluster as a whole through the cluster settings...

Is the loading enabled on the Data Nodes? '

Maybe look at this thread

Hello @stephenb ,

Like I said in ealier post, all data node are able to download from my endpoint and all 14 nodes have the settings "ingest.geoip.downloader.enabled : true" .. But, thew thing is that the error message related to .geoip_databases index is missing , it's true.. That index is not present.

I tried to resolve it following this article, but without success: How to delete .geoip_databases index

I tried to
see the result of the "GET _cat/indices/.geoip_databases" :

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [.geoip_databases]",
        "index_uuid": "_na_",
        "resource.type": "index_or_alias",
        "resource.id": ".geoip_databases",
        "index": ".geoip_databases"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index [.geoip_databases]",
    "index_uuid": "_na_",
    "resource.type": "index_or_alias",
    "resource.id": ".geoip_databases",
    "index": ".geoip_databases"
  },
  "status": 404
}

Thank you for all your help !
Dominic

Not sure what to tell you ....

Can you try using the normal / non custom endpoint?

See if it works... would then narrow down

There should be other errors on the data nodes while trying to load perhaps take a look there.

Hello @stephenb ,

I wanted to share an update on our situation. After consulting with Elastic Support and investigating further, we identified SSL handshake errors on the Elasticsearch node responsible for downloading the GeoIP database. The resolution involved adding our CA certificate to the truststore of the JVM used by Elasticsearch. Initially, I assumed that all SSL-related queries or API calls within Elasticsearch would utilize the SSL configurations specified in elasticsearch.yml. However, this experience taught me otherwise—an enlightening moment indeed!

With the truststore now updated across all Elasticsearch nodes, the GeoIP database began to download as expected. Success at last! But, there's a twist...

Despite the progress, I encountered an issue where the "_geoip_database_unavailable_GeoLite2-City.mmdb" tag persisted in indices utilizing the geoip processor. To address this, I followed the instructions in the "Use a custom endpoint" section. Notably, before proceeding with step 3, I renamed our GEOIP2-City.mmdb database file to GeoLite2-City.mmdb. This workaround proved effective, yet it feels like a workaround that ideally shouldn't be necessary due to the need to replicate the default filename.

What are your thoughts on this matter?

1 Like

Wow good find on the SSL I have not run into that before

With respect to the file name ..I would report your findings via the support ticket and tell the that looks like a bug.

I had opened an issue on github : GEOIP Database Update Issue: Documentation Followed, Databases Not Updating on Ingest Nodes · Issue #104484 · elastic/elasticsearch (github.com)

1 Like

@Zabulon

Curious if you tried setting yout database name in the ingest processor or did that not make any difference

PUT _ingest/pipeline/geoip
{
  "description" : "Add geoip info",
  "processors" : [
    {
      "geoip" : {
        "field" : "ip",
        "target_field" : "geo",
        "database_file" : "GEOIP2-City.mmdb"<!--- Here
      }
    }
  ]
}

@stephenb

Certainly, setting the database name in the ingest processor does work as expected. However, this approach doesn't align with our overarching goal. We're aiming to replace the default Elastic GeoIP database processing mechanism in a manner that doesn't require us to add specific configuration lines to every GeoIP processor. More importantly, we want to avoid the need to create or modify custom pipelines for all the Fleet integrations we utilize. Our objective is to streamline this process for efficiency and scalability.

1 Like

Understood. I think the experience could be better for sure, but on the other hand, if you're trying to replace the default databases, I'm not surprised that the DBs need to have the default names.

@stephenb

Could it be that I've misunderstood the documentation? I Followed this doc: " Use a custom endpoint" and nothing about a specific database name.

Yeah I think it is implied but not explicit... agree could be more clear but when I re-read it.

You can create a service that mimics the Elastic GeoIP endpoint. You can then get automatic updates from this service.

Mimic would mean the default database files, because that is all we provide.

  1. Download your .mmdb database files from the MaxMind site.

The link is to the GeoLite2 databases etc

Then here in the Load Your Custom files section, which is what you are doing because yours are not the Default GeoLite.

  1. In your geoip processors, configure the database_file parameter to use a custom database file.

So really you are doing 2 things

Providing a custom endpoint and custom database files...

Still, agree it could be easier to do / debug..., but now I think I understand...