How to a emit a value for a runtime field of type geo_point?

I am trying to execute a query for a geo_point value with a runtime mapping, which according to the docs should work, but it is failing. keywords and long values work fine. For example...

Where src.city.name is a keyword...

"city": {
  "type": "object",
  "properties": {
    "name": {
      "type": "keyword"
    }
  }
},

the following works...

{
  "runtime_mappings": {
    "city.name": {
      "type": "keyword",
      "script": {
        "source": "if (doc.containsKey('src.city.name') && !doc['src.city.name'].empty) {emit(doc['src.city.name'].value)}"
      }
    }
  },
  "fields": [
    "city.name"
  ],
  "size": 1,
  "query": {
    "match": {
      "server.ipaddr": "192.0.2.11"
    }
  }
}

Result...

"fields": {
  "city.name": [
    "Macroom"
  ]
}

However, using the same method for a geo_point fails.

Where src.loc.coord is a geo_point...

"loc": {
  "type": "object",
  "properties": {
    "coord": {
      "type": "geo_point"
    }
  }
},

the following query fails...

{
  "runtime_mappings": {
    "loc.coord": {
      "type": "geo_point",
      "script": {
        "source": "if (doc.containsKey('src.loc.coord') && !doc['src.loc.coord'].empty) {emit(doc['src.loc.coord'].value)}"
      }
    }
  },
  "fields": [
    "loc.coord"
  ],
  "size": 1,
  "query": {
    "match": {
      "server.ipaddr": "192.0.2.11"
    }
  }
}

The error given is...

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
          "... src.loc.coord'].empty) {emit(doc['src.loc.coord ...",
          "                            ^---- HERE"
        ],
        "script": "if (doc.containsKey('src.loc.coord') && !doc['src.loc.coord'].empty) {emit(doc['src.loc.coord'].value)}",
        "lang": "painless",
        "position": {
          "offset": 88,
          "start": 63,
          "end": 113
        }
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test_index-2021.w17",
        "node": "mDbh3hqhPRi8x0WMRhf-fB",
        "reason": {
          "type": "script_exception",
          "reason": "compile error",
          "script_stack": [
            "... src.loc.coord'].empty) {emit(doc['src.loc.coord ...",
            "                            ^---- HERE"
          ],
          "script": "if (doc.containsKey('src.loc.coord') && !doc['src.loc.coord'].empty) {emit(doc['src.loc.coord'].value)}",
          "lang": "painless",
          "position": {
            "offset": 88,
            "start": 63,
            "end": 113
          },
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Unknown call [emit] with [[org.elasticsearch.painless.node.EDot@356bd657]] arguments."
          }
        }
      }
    ]
  },
  "status": 400
}

How must this query be modified to work with geo_point fields?

I believe emit for geo_points takes two numbers as parameters.

There are five ways to express a geo_point:

  • as an object, with lat and lon keys.
  • as a string with the format: "lat,lon" .
  • as a geohash.
  • as an array with the format: [ lon , lat ]
  • as a Well-Known Text POINT with the format: "POINT(lon lat)"

The field in question is using the string expression. For example:

"loc": {
  "coord": "51.9017,-8.9492"
},

The docs say - "When accessing the value of a geo-point in a script, the value is returned as a GeoPoint object". So when emit() is passed the value doc['src.loc.coord'].value, it is being passed a GeoPoint object already.

However, If you are correct that "emit for geo_points takes two numbers as parameters", is this documented anywhere? What is the correct order lat, long or long, lat?

Sorry. I wasn't involved with the docs for this one. Looks like it is lat, lon. https://github.com/elastic/elasticsearch/blob/50152589cab00fbb984f9f40d68173addee5970f/server/src/main/java/org/elasticsearch/script/GeoPointFieldScript.java#L59

After reviewing the painless source (thanks for the link @nik9000 ) I put together the following reference for emit().

emit()

Type Function Multiple Calls
boolean emit(boolean v) NO
date emit(long v) NO
double emit(double v) YES
geo_point emit(double lat, double lon) NO
ip emit(String v) YES
keyword emit(String v) YES
long emit(long v) YES

Emitting geo_point values

For most runtime field types document fields of the same type can be used directly as parameters. For example:

emit(doc['host.name'].value)

However for geo_point fields, emit doesn't accept a geo_point object, rather latitude and longitude must be passed as two separate parameters. For example:

def geopoint = doc['location.coordinates'].value;
def lat = geopoint.lat;
def lon = geopoint.lon;
emit(lat, lon)

Calling emit() multiple times in a script

For some field types emit() may be called multiple times in a script. Doing so will result in an array of values up to the maximum number of values allowed. The table above indicates which values support multiple calls.

The following is an example of how this is enforced by the Painless language, is as follows - note the call to checkMaxSize():

public final void emit(long v) {
    checkMaxSize(count);
    if (values.length < count + 1) {
        values = ArrayUtil.grow(values, count + 1);
    }
    values[count++] = v;
}

NOTE! The maximum number of values that can be emitted is 100. This is constant and cannot be overridden (public static final int MAX_VALUES = 100;).

String values are not limited by the number of values, rather by the number of characters. This is enforced by the Painless language as follows - note the check for chars > MAX_CHARS:

public final void emit(String v) {
    checkMaxSize(results.size());
    chars += v.length();
    if (chars > MAX_CHARS) {
        throw new IllegalArgumentException(
            String.format(
                Locale.ROOT,
                "Runtime field [%s] is emitting [%s] characters while the maximum number of values allowed is [%s]",
                fieldName,
                chars,
                MAX_CHARS
            )
        );
    }
    results.add(v);
}

NOTE! The maximum number of characters that can be emitted is 1048576. This is constant and cannot be overridden (public static final long MAX_CHARS = 1024 * 1024;).

I believe they all support multiple calls. I could have sworn we had a table like this somewhere but I sure don't see it.

When I review the underlying Java code it didn't look like there was any special handling for those marked NO, i.e. nothing keeping track of the count or checking it compared to MAX_VALUES. However, Java is not my thing, so I may have missed something.

The good news is that we are able to reduce the storage requirement for our dataset by ~44% (measured by feeding in data --> stopping ingestion --> force merging --> checking index stats API).

Up next is benchmarking ingest performance and query performance.

Nice! I double checked the logic and they all allow emitting multiple values. boolean counts true and false values and doesn't actually store them at all so it never checks against the limit because multiple calls to emit don't cost any space. long checks the limit, like you saw. date and geo_point use long's limit tracking code.

I asked around and we don't have a table like you described. I'm going to add something here: https://www.elastic.co/guide/en/elasticsearch/painless/7.12/painless-runtime-fields-context.html