GeoIP challenges - Custom-City.mmdb

I am new to the forum, but not new to Elastic Stack. I have been using it for about 7 years. Started with Version 5 and upgraded to 8 over time.
Sorry for the long post but felt that some foundation was necessary.

Current Elastic Stack install is Version 8.11.0
Logstash Geoip Processor - using a custom mmdb file -
is working perfectly on 3 pipelines.
BUT
in Kibana - adding geoip processor to an ingest-pipeline fails -
using the same custom mmdb

Testing in curl using simulate

The following test works:

curl --cacert http_ca.crt -X POST -u user:pass "https://localhost:9200/_ingest/pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
  "pipeline" :
  {
    "description": "_GEO",
    "processors": [
      {
        "geoip": {
          "field": "ip",
          "target_field": "geo"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar",
        "ip": "10.18.106.44"
      }
    }
  ]
}
'

Result:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_version" : "-3",
        "_id" : "id",
        "_source" : {
          "geo" : {
            "continent_name" : "North America",
            "country_name" : "United States",
            "location" : {
              "lon" : -97.822,
              "lat" : 37.751
            },
            "country_iso_code" : "US"
          },
          "foo" : "bar",
          "ip" : "10.18.106.44"
        },
        "_ingest" : {
          "timestamp" : "2024-03-11T15:02:42.258728904Z"
        }
      }
    }
  ]
}

The following test fails:

curl --cacert http_ca.crt -X POST -u user:pass "https://localhost:9200/_ingest/pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
  "pipeline" :
  {
    "description": "_GEO",
    "processors": [
      {
        "geoip": {
          "field": "ip",
          "target_field": "geo",
          "database_file": "Custom-City.mmdb"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar",
        "ip": "10.18.106.44"
      }
    }
  ]
}
'

Result:

{
  "docs" : [
    {
      "error" : {
        "root_cause" : [
          {
            "type" : "runtime_exception",
            "reason" : "java.lang.NullPointerException: Cannot invoke \"Object.getClass()\" because \"parameters[index]\" is null"
          }
        ],
        "type" : "runtime_exception",
        "reason" : "java.lang.NullPointerException: Cannot invoke \"Object.getClass()\" because \"parameters[index]\" is null",
        "caused_by" : {
          "type" : "null_pointer_exception",
          "reason" : "Cannot invoke \"Object.getClass()\" because \"parameters[index]\" is null"
        }
      }
    }
  ]
}

in Logstash Filter - This is working perfectly.

      # GeoIP location services for source
      geoip {
        database => "/etc/elasticsearch/ingest-geoip/Custom-City.mmdb"
        source => "[sourceIP]"
        target => source
      } # geoip

      # GeoIP location services for destination
      geoip {
        database => "/etc/elasticsearch/ingest-geoip/Custom-City.mmdb"
        source => "[destinationIP]"
        target => destination
      } # geoip

There is only one Custom-City.mmdb on my system.
It contains my Local IP subnets, and is regenerated and deployed nightly.
The GeoLite2 files are updated weekly.
They are all located in /etc/elasticsearch/ingest-geoip

-rw-r--r--.  1 xx xx   137333 Mar 11 03:43 Custom-City.mmdb
-rw-r--r--.  1 xx xx  8414134 Mar  8 10:34 GeoLite2-ASN.mmdb
-rw-r--r--.  1 xx xx 63829032 Mar  8 16:12 GeoLite2-City.mmdb
-rw-r--r--.  1 xx xx  6381855 Mar  8 16:14 GeoLite2-Country.mmdb

I use go code to create the custom mmdb file from a MySQL Database.
If you need to see the go code let me know.

Please let me know about the code used in Logstash versus the Ingest Pipeline GeoIP code. Are they different? Can the code in Ingest processor get updated to work the same way the Logstash Filter GeoIP works?

Thanks
Steve B

in Kibana - adding geoip processor to an ingest-pipeline fails

Sorry, but I don't see where Kibana comes into the picture in this question. It sounds like you tracked the error to a specific API call in Elasticsearch. I feel confident that the Elasticsearch area of this forum could give better help. See https://discuss.elastic.co/c/elastic-stack/elasticsearch/6

Hi @SteveB1963, Welcome to the community.

The logstash code and the ingest process that runs in Elastic are similar but not exactly the same... off hand I can not tell you the differences, but they are 2 separate code bases.

All the code is public in the 2 git repos...

Here is the geoip processor for elasticsearch

You can share, I am not sure whether anyone can help debug your code.

I would make sure your Custom mmdb matches exactly the format/schema etc of the maxmind mmdb

Thanks for the reply.

The Custom-City.mmdb works fine in the Logstash filter.

It causes a pipeline fail when used in the ingest-pipeline from Kibana interface.

Same exact file works and does Not work...

2 different code bases - one is good for me, one is not.

I am not a programmer and would probably fail looking at that code to find a solution. My gocode is taken from the MaxMind site posting. I copied and made a change to the source of data. It works great in Logstash.

Steve B

I am using Kibana interface to edit the ingest-processor. I added the geoip processor there. The failure is immediate. To debug I found the command line used to show the failure. So it starts in Kibana for me. That is why I reference Kibana.

Thanks

Steve B

When you are in Kibana -> Dev Tools

That just runs Raw Elasticsearch REST API Calls against Elasticsearch that is what those commands are above you are running... that is why Tim said it is not Kibana related. Don't worry about that I put this in the correct category.

Clearly the ingest pipeline fails when you call Elasticsearch directly as you showed.

curl --cacert http_ca.crt -X POST -u user:pass "https://localhost:9200/_ingest/pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
...

I am not sure what to tell you Logstash and Elasticsearch are 2 completely separate code bases...

Your custom .mmbd is not compatible ... I would suggest comparing the official one with yours...

So if I analyze the standard *-City.mmdb and make mine look the same I should be good? I will try to do that.

In the mean time, this is my go code. Any feedback on my code is welcome.

package main

import (
  "log"
  "net"
  "os"
  "fmt"

  "github.com/maxmind/mmdbwriter"
  "github.com/maxmind/mmdbwriter/mmdbtype"

  "database/sql"
  _ "github.com/go-sql-driver/mysql"
)

// Define Tag struct
/*
 * Tag... - a very simple struct
*/
type Tag struct {
  Subnet string `json:"adcn"`
  Continent_Code string `json:"xlcontinent_code"`
  Continent_Name string `json:"xlcontinent_name"`
  Country_Code string `json:"xlcountry_code3"`
  Country_Name string `json:"xlcountry_name"`
  Subdivision_Code string `json:"xlsubdivision_code"`
  Subdivision_Name string `json:"xlsubdivision_name"`
  City_Name string `json:"xlcity_name"`
  Building string `json:"xlbuilding"`
  Time_Zone string `json:"xltime_zone"`
  Latitude string `json:"xllatitude"`
  Longitude string `json:"xllongitude"`
}

func main() {
  //make new instance of mmdb writer
  writer, err := mmdbwriter.New(
    mmdbwriter.Options{
      DatabaseType:            "GeoLite2-City",
      RecordSize:              28,
      IPVersion:               6,
      IncludeReservedNetworks: true,
    },
  )
  if err != nil {
    log.Fatal(err)
  }

  // Open the connection to MySQL
  // Info -->                     user:password             DB Location:port /DB Name
  db, err := sql.Open("mysql", "sanitized:sanitized@tcp(some-IP-here:3306)/my-db")
  if err != nil {
    panic(err.Error())
  }
  defer db.Close()

  // Execute the query
  results, err := db.Query("SELECT adcn, xlcontinent_code, xlcontinent_name, xlcountry_code3, xlcountry_name, xlsubdivision_code, xlsubdivision_name, xlcity_name, xlbuilding, xltime_zone, xllatitude, xllongitude FROM ADSubnets_adxrt ORDER BY adcn ")
  if err != nil {
    panic(err.Error())
  }

  // Look at the results
  for results.Next() {
    var tag Tag
    // var tag
    // for each row scan the results into the tag composite object
    err = results.Scan(
      &tag.Subnet,
      &tag.Continent_Code,
      &tag.Continent_Name,
      &tag.Country_Code,
      &tag.Country_Name,
      &tag.Subdivision_Code,
      &tag.Subdivision_Name,
      &tag.City_Name,
      &tag.Building,
      &tag.Time_Zone,
      &tag.Latitude,
      &tag.Longitude)
    if err != nil {
      panic(err.Error())
    }

    // MMDB Section
    // ==== range
    _, network, err := net.ParseCIDR(tag.Subnet)
    if err != nil {
      log.Fatal(err)
    }

    // all the records are in this map
    record := mmdbtype.Map{

      // ==== continent
      "continent": mmdbtype.Map {
        "code": mmdbtype.String(tag.Continent_Code),
        "names": mmdbtype.Map{
          "en": mmdbtype.String(tag.Continent_Name),
        },
      },

      // ==== country
      "country": mmdbtype.Map{
        "iso_code": mmdbtype.String(tag.Country_Code),
        "names": mmdbtype.Map{
          "en": mmdbtype.String(tag.Country_Name),
        },
      },
      // ==== city
      "city": mmdbtype.Map {
        "names": mmdbtype.Map{
          "en": mmdbtype.String(tag.City_Name),
        },
      },

      // ==== location
      "location": mmdbtype.Map{
        "latitude": mmdbtype.String(tag.Latitude),
        "longitude": mmdbtype.String(tag.Longitude),
        "time_zone": mmdbtype.String(tag.Time_Zone),
      },
    }

    // Write the record and loop back
    err = writer.Insert(network, record)
    if err != nil {
      fmt.Println(err)
      return
    }

  }

  // data is acquired so now write all the records to the file
  fh, err := os.Create("../Custom-City.mmdb")
  if err != nil {
    fmt.Println(err)
    return
    // log.Fatal(err)
  }
  defer fh.Close()

  _, err = writer.WriteTo(fh)
  if err != nil {
    log.Fatal(err)
  }

  // State the obvious - we are done
  fmt.Println("The End")
}

Thanks

Steve B

Is there a stack trace for the NullPointerException in the logs of any of the elasticsearch nodes?

I just checked this on all my servers. When I run:
"_ingest/pipeline/_simulate" (complete syntax is in previous post)
No logs seem to notice. I checked all files in /var/log/elasticsearch when I ran the command. There was no reaction at all. Not one new line added to any of the logs.

Thanks

Steve B

Unfortunately it looks like we might not actually log the stack trace for simulate. We log it at debug level if it is an option to set the log level for org.elasticsearch.action.bulk to debug and to actually run the processor using the indexing or _bulk API. If that's not an option, you can't share that custom database, can you?

I can create a subset of the database, test to make sure it Works in Logstash and Fails in ingest-pipeline. Then I can send you that mmdb file.
It should be easy if I change the SQL query a little. I hope.

Thanks

Steve B

I have created a small mmdb file that has 4 subnets. I copied it to /etc/elasticsearch/ingest-geoip

It is named Custom-City.mmdb - created using the code I posted but with a small SQL modification. Like this:

results, err := db.Query("SELECT adcn, xlcontinent_code, xlcontinent_name, xlcountry_code3, xlcountry_name, xlsubdivision_code, xlsubdivision_name, xlcity_name, xlbuilding, xltime_zone, xllatitude, xllongitude FROM ADSubnets_adxrt WHERE adcn LIKE '10.103.212%' OR adcn LIKE '10.103.23%' OR adcn LIKE '10.103.61%' OR adcn LIKE '10.104.128%' ORDER BY adcn ")

So a query for hosts on these subnets should provide Geo data.

10.103.212.0/24
10.103.23.0/24
10.103.61.0/24
10.104.128.0/24

This is the syntax I used once again with the same results.

curl --cacert /etc/elasticsearch/certs/http_ca.crt -X POST -u username:userpass "https://localhost:9200/_ingest/pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
  "pipeline" :
  {
    "description": "_GEO",
    "processors": [
      {
        "geoip": {
          "field": "ip",
          "target_field": "geo",
          "database_file": "Custom-City.mmdb"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar",
        "ip": "10.103.61.44"
      }
    }
  ]
}
'

Like I stated before - this works perfectly in Logstash Filters.

Hmmm... How can I get the mmdb file uploaded? This forum does not like mmdb attachments. Please advise.

Thank you for your help!

Steve B

I figured out how you can make the mmdb file yourself.
From all my postings I believe that many people can benefit from all this work.
GeoIP Mapping of my internal IP space is very helpful in debug activities.

See Below:

  1. new GO code that reads a CSV file,
  2. the CSV file,
  3. the Simulate Syntax.

Go Source Code

(main.go)

package main

import (
  "encoding/csv"
  "io"
  "log"
  "net"
  "os"
  "fmt"

  "github.com/maxmind/mmdbwriter"
  "github.com/maxmind/mmdbwriter/mmdbtype"
)

func main() {
  writer, err := mmdbwriter.New(
    mmdbwriter.Options{
      DatabaseType:            "GeoLite2-City",
      RecordSize:              28,
      IPVersion:               6,
      IncludeReservedNetworks: true,
    },
  )
  if err != nil {
    log.Fatal(err)
  }

  // Test for the file
  file, err := os.Open("Custom-City.csv")
  defer file.Close()
  if err != nil {
    log.Fatal("Error while reading the file", err)
  }

  r := csv.NewReader(file)
  // first line header info
  r.Read()

  // start looping thru the csv
  for {
    // check that there are csv records or are we at the end
    row, err := r.Read()
    if err == io.EOF {
      break
    }
    if err != nil {
      log.Fatal(err)
    }
    // verify that there are 11 columns
    if len(row) != 11 {
      log.Fatalf("unexpected CSV rows: %v", row)
    }

    //  0 geoipip
    //  1 geoipcontinent_code
    //  2 geoipcontinent_name
    //  3 geoipcountry_code3
    //  4 geoipcountry_name
    //  5 geoipregion_code
    //  6 geoipregion_name
    //  7 geoipcity_name
    //  8 geoiptime_zone
    //  9 geoiplatitude
    // 10 geoiplongitude

    // MMDB Section
    // ==== range
    _, network, err := net.ParseCIDR(row[0])
    if err != nil {
      log.Fatal(err)
    }

    // all the records are in this map
    record := mmdbtype.Map{

      // ==== continent
      "continent": mmdbtype.Map {
        "code": mmdbtype.String(row[1]),
        "names": mmdbtype.Map{
          "en": mmdbtype.String(row[2]),
        },
      },

      // ==== country
      "country": mmdbtype.Map{
        "iso_code": mmdbtype.String(row[3]),
        "names": mmdbtype.Map{
          "en": mmdbtype.String(row[4]),
        },
      },

      // ==== city
      "city": mmdbtype.Map {
        "names": mmdbtype.Map{
          "en": mmdbtype.String(row[7]),
        },
      },

      // ==== location
      "location": mmdbtype.Map{
        "latitude":  mmdbtype.String(row[9]),
        "longitude": mmdbtype.String(row[10]),
        "time_zone": mmdbtype.String(row[8]),
      },
    }

    // Write the record and loop back
    err = writer.Insert(network, record)
    if err != nil {
      fmt.Println(err)
      return
    }
  }

  // data is looped so now write all the records to the file
  fh, err := os.Create("Custom-City.mmdb")
  defer fh.Close()
  if err != nil {
    fmt.Println(err)
    return
    // log.Fatal(err)
  }

  _, err = writer.WriteTo(fh)
  if err != nil {
    log.Fatal(err)
  }
  // State the obvious - we are done
  fmt.Println("The End")
}

CSV File

(Custom-City.csv)
no blank lines please and yes it has a header

geoipip,geoipcontinent_code,geoipcontinent_name,geoipcountry_code3,geoipcountry_name,geoipregion_code,geoipregion_name,geoipcity_name,geoiptime_zone,geoiplatitude,geoiplongitude
10.103.212.0/24,NA,North America,USA,United States,0,0,Jacksonville,America/New_York,30.33218400,-81.65564700
10.103.23.0/24,EU,Europe,DEU,Germany,0,0,Hamburg,Europe/Berlin,53.55108600,9.99368200
10.103.61.0/24,NA,North America,USA,United States,0,0,Gloucester,America/New_York,35.10807800,-80.78914600
10.104.128.0/24,EU,Europe,GBR,United Kingdon,ENG,England,Marston Green,Europe/London,50.99800000,-2.64920000

Elastic Command line Simulate Syntax:

curl --cacert /etc/elasticsearch/certs/http_ca.crt -X POST -u user:pass "https://localhost:9200/_ingest/pipeline/_simulate?pretty" -H 'Content-Type: application/json' -d'
{
  "pipeline" :
  {
    "description": "_GEO",
    "processors": [
      {
        "geoip": {
          "field": "ip",
          "target_field": "geo",
          "database_file": "Custom-City.mmdb"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "bar",
        "ip": "10.103.212.44"
      }
    }
  ]
}
'

All code advice is welcome. Please see if you can also see the failure.

Thank You !!

Steve B

I was able to reproduce the problem. Here's the stack trace (I had to get this through a debugger because the maxmind code loses it when it runs into a second exception while trying to handle this one):

java.lang.IllegalArgumentException: argument type mismatch
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:65)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeMapIntoObject(Decoder.java:441)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeMap(Decoder.java:341)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeByType(Decoder.java:162)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decode(Decoder.java:151)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeMapIntoObject(Decoder.java:434)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeMap(Decoder.java:341)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decodeByType(Decoder.java:162)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decode(Decoder.java:151)
	at com.maxmind.db@3.1.0/com.maxmind.db.Decoder.decode(Decoder.java:76)
	at com.maxmind.db@3.1.0/com.maxmind.db.Reader.resolveDataPointer(Reader.java:411)
	at com.maxmind.db@3.1.0/com.maxmind.db.Reader.getRecord(Reader.java:185)
	at com.maxmind.geoip2@4.2.0/com.maxmind.geoip2.DatabaseReader.get(DatabaseReader.java:280)
	at com.maxmind.geoip2@4.2.0/com.maxmind.geoip2.DatabaseReader.getCity(DatabaseReader.java:365)
	at com.maxmind.geoip2@4.2.0/com.maxmind.geoip2.DatabaseReader.tryCity(DatabaseReader.java:359)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader.lambda$getResponse$2(DatabaseReaderLazyLoader.java:194)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.GeoIpCache.putIfAbsent(GeoIpCache.java:64)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader.getResponse(DatabaseReaderLazyLoader.java:192)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.DatabaseReaderLazyLoader.getCity(DatabaseReaderLazyLoader.java:157)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.GeoIpProcessor.retrieveCityGeoData(GeoIpProcessor.java:208)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.GeoIpProcessor.getGeoData(GeoIpProcessor.java:172)
	at org.elasticsearch.ingest.geoip@8.14.0-SNAPSHOT/org.elasticsearch.ingest.geoip.GeoIpProcessor.execute(GeoIpProcessor.java:132)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:165)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:141)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:129)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:848)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ingest.SimulateExecutionService.lambda$execute$3(SimulateExecutionService.java:81)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.action.ActionRunnable$4.doRun(ActionRunnable.java:100)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
	at org.elasticsearch.server@8.14.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.ClassCastException: Cannot cast java.lang.String to java.lang.Double
	at java.base/java.lang.Class.cast(Class.java:4090)
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	... 35 more

The problem appears to be that lat/lon values in your .mdb file are strings rather than doubles. I was hoping we'd be able to log a more useful error message from Elasticsearch, but I don't think we will be able to because it's all within maxmind code that we don't own.
I think you just need to modify your go code to write out doubles for lat/lon.

I modified your go script to use mmdbtype.Float64 instead of mmdbtype.String, and now it's working as expected. I'm actually curious how this is working in logstash because I would expect that it would be using com.maxmind.geoip2.DatabaseReader as well.

1 Like

THANK YOU !!!
This is an Awesome way to start the weekend.
I will update my code and put it in production.
My 10,000 Internal IP thanks you too!

Steve B

3 Likes

I created a maxmind issue to track this: Docoder NullPointerException masks underlying exception · Issue #164 · maxmind/MaxMind-DB-Reader-java · GitHub.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.