Geo-distance exception failure and message

I am getting a failure from a geo_distance query. It's returning hits but
also returning a failure status and message:

"failures" : [ {
"index" : "lidb",
"shard" : 4,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[lidb][4]:
query[filtered(ConstantScore(NotDeleted(+cache(_type:locality)
+GeoDistanceFilter(location, ARC, 155.3427980593335, 32.0,
-117.0))))->cache(_type:locality)],from[0],size[10]: Query Failed [Failed
to execute main query]]; nested: StringIndexOutOfBoundsException[String
index out of range: -1]; "
} ]

Details follow:

I read the documentation and various examples and elasticsearch community
posts.

I had an index named "lidb" with about 297 documents, exploring the cool
behavior of the snowball analyzer, phrase matching, and my own table-based
synonym facility on top of ES (that's another, and also very happy, story).
All done so far using the Java API. But as I've discovered, the toString
method of the query builder and filter builder classes emit pretty JSON, so
it's surprisingly easy to write the Java code and quickly check it against
the JSON-based API documentation.

First off, my local ElasticSearch cluster (one node: my local laptop) has
the following configured analyzer. I imagine adding the "localtion" field
in one or more types to this default configuration:

index:
analysis:
analyzer:
# set stemming analyzer with no stop words as the default
default:
type: snowball
stopwords: none
filter:
stopWordsFilter:
type: stop
stopwords: none

Based on an example I found, I deleted the index (and all documents in
it???), then recreated the index and added the mapping for the "locality"
type (to hold cities, towns, and so on). All returned with success:

curl -XDELETE 'http://localhost:9200/lidb'
curl -XPUT 'http://localhost:9200/lidb'
curl -XPUT 'http://localhost:9200/lidb/locality/_mapping' -d '{ "locality"
: { "properties" : { "location" : {"type" : "geo_point"}}}}'

Next, I bulk-loaded the following tiny subset of data with "location": [
lon, lat ] according to the GeoJSON recommendations. The latitude and
longitude fields are left over from a conversion artifact; I'll be removing
them. But otherwise, the bulk load was successful (no surprises there):

{ "index" : { "_index" : "lidb", "_type" : "subscriber", "_id" :
"8004441616" } }
{ "telno" : "8004441616", "cnam" : "SUNNY SUNGLASS", "o" : "Sunny's
Sunglasses", "city" : "San Dimas", "state" : "CA", "latitude" : 34.102908,
"longitude" : -117.816249, "location" : [ -117.816249, 34.102908 ] }
{ "index" : { "_index" : "lidb", "_type" : "subscriber", "_id" :
"8004442626" } }
{ "telno" : "8004442626", "cnam" : "RAINY SUNGLASS", "o" : "Rainy's
Sunglasses", "city" : "San Dimas", "state" : "CA", "latitude" : 34.102908,
"longitude" : -117.816249, "location" : [ -117.816249, 34.102908 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "1" } }
{ "city" : "Abbeville", "state" : "AL", "latitude" : 31.566367, "longitude"
: -85.251300, "location" : [ -85.251300, 31.566367 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "2" } }
{ "city" : "San Carlos", "state" : "CA", "latitude" : 37.499187,
"longitude" : -122.263278, "location" : [ -122.263278, 37.499187 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "3" } }
{ "city" : "San Clemente", "state" : "CA", "latitude" : 33.437828,
"longitude" : -117.620397, "location" : [ -117.620397, 33.437828 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "4" } }
{ "city" : "Sand City", "state" : "CA", "latitude" : 36.614759, "longitude"
: -121.850060, "location" : [ -121.850060, 36.614759 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "5" } }
{ "city" : "San Diego", "state" : "CA", "latitude" : 32.779541, "longitude"
: -117.146344, "location" : [ -117.146344, 32.779541 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "6" } }
{ "city" : "San Diego Country Estates", "state" : "CA", "latitude" :
33.002636, "longitude" : -116.799005, "location" : [ -116.799005, 33.002636
] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "7" } }
{ "city" : "San Dimas", "state" : "CA", "latitude" : 34.102908, "longitude"
: -117.816249, "location" : [ -117.816249, 34.102908 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "8" } }
{ "city" : "San Fernando", "state" : "CA", "latitude" : 34.287251,
"longitude" : -118.438836, "location" : [ -118.438836, 34.287251 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "9" } }
{ "city" : "San Francisco", "state" : "CA", "latitude" : 37.759881,
"longitude" : -122.437392, "location" : [ -122.437392, 37.759881 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "10" } }
{ "city" : "San Gabriel", "state" : "CA", "latitude" : 34.094176,
"longitude" : -118.098449, "location" : [ -118.098449, 34.094176 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "11" } }
{ "city" : "New York Mills", "state" : "MN", "latitude" : 46.519423,
"longitude" : -95.373026, "location" : [ -95.373026, 46.519423 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "12" } }
{ "city" : "West New York", "state" : "NJ", "latitude" : 40.788400,
"longitude" : -74.013090, "location" : [ -74.013090, 40.788400 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "13" } }
{ "city" : "Albuquerque", "state" : "NM", "latitude" : 35.110703,
"longitude" : -106.609991, "location" : [ -106.609991, 35.110703 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "14" } }
{ "city" : "New York", "state" : "NY", "latitude" : 40.704234, "longitude"
: -73.917927, "location" : [ -73.917927, 40.704234 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "15" } }
{ "city" : "New York Mills", "state" : "NY", "latitude" : 43.102569,
"longitude" : -75.292105, "location" : [ -75.292105, 43.102569 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "16" } }
{ "city" : "Niagara Falls", "state" : "NY", "latitude" : 43.094305,
"longitude" : -79.017339, "location" : [ -79.017339, 43.094305 ] }
{ "index" : { "_index" : "lidb", "_type" : "locality", "_id" : "17" } }
{ "city" : "Yoder", "state" : "WY", "latitude" : 41.917560, "longitude" :
-104.295060, "location" : [ -104.295060, 41.917560 ] }

So I tried out my very first geo_distance query. It returned 3 hits. I have
to go back and manually calculate all the distances to verify the accuracy,
but for now I trust ES (of course).

But what is odd is that it also returned a failure exception:

curl -XGET 'http://localhost:9200/lidb/locality/_search?pretty=true' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "250km",
"locality.location": {
"lat": 32,
"lon": -117
}
}
}
}
}
}'

And here is the response:

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"failed" : 1,
"failures" : [ {
"index" : "lidb",
"shard" : 4,
"status" : 500,
"reason" : "QueryPhaseExecutionException[[lidb][4]:
query[filtered(ConstantScore(NotDeleted(+cache(_type:locality)
+GeoDistanceFilter(location, ARC, 155.3427980593335, 32.0,
-117.0))))->cache(_type:locality)],from[0],size[10]: Query Failed [Failed
to execute main query]]; nested: StringIndexOutOfBoundsException[String
index out of range: -1]; "

} ]
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "lidb",
"_type" : "locality",
"_id" : "5",
"_score" : 1.0, "_source" : { "city" : "San Diego", "state" : "CA",
"latitude" : 32.779541, "longitude" : -117.146344, "location" : [
-117.146344, 32.779541 ] }
}, {
"_index" : "lidb",
"_type" : "locality",
"_id" : "6",
"_score" : 1.0, "_source" : { "city" : "San Diego Country Estates",
"state" : "CA", "latitude" : 33.002636, "longitude" : -116.799005,
"location" : [ -116.799005, 33.002636 ] }
}, {
"_index" : "lidb",
"_type" : "locality",
"_id" : "7",
"_score" : 1.0, "_source" : { "city" : "San Dimas", "state" : "CA",
"latitude" : 34.102908, "longitude" : -117.816249, "location" : [
-117.816249, 34.102908 ] }
} ]
}
}

But the health is normal. I expect yellow: This is running on a local
MacBook with the default of 5 shards and 1 replica. There is also a few
twitter documents that have been added during my exploration of the
examples; those weren't modified as they were in another index:

$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "brian-exploration",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
}

Of course, I'll completely erase the data subdirectory and restart and try
this again from a clean start. But this particular failure might indicate a
problem, and I'll leave my setup alone if you need more details from it.

--

When working with ES, I always seem to end up finding a solution. But the
answer is typically empirical. In this case, it seems that the problem was
because I had two types (subscriber and locality), and had the "location"
field in both, but this field was only mapped to geo_point in the locality
type. When I added the mapping again but for the subscriber type, now all
my locality geo_distance queries work. And geo_distance queries also work
when going across types (e.g. find localities and subscribers within some
radius).

curl -XDELETE 'http://localhost:9200/lidb'
curl -XPUT 'http://localhost:9200/lidb'
curl -XPUT 'http://localhost:9200/lidb/locality/_mapping' -d '{ "locality"
: { "properties" : { "location" : {"type" : "geo_point"}}}}'
curl -XPUT 'http://localhost:9200/lidb/subscriber/_mapping' -d '{
"subscriber" : { "properties" : { "location" : {"type" : "geo_point"}}}}'

No more errors status values in the response.

So my goal is to keep things simple:

  1. The "locality" field in all types are always a GeoJSON array of lon, lat)
  2. Configured at the server level by default (avoid the need to explicitly
    PUT the mapping each time)

So, just as the index.analysis.analyzer.default rules in the
elasticsearch.yml configuration file can apply the snowball analyzer with
no stopwords to any field in any type in any index, how can I just as
easily apply the geo_point mapping to the "location" field in any type in
any index?

Thanks!

--