Hi,
I'd really appreciate some help here.
My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:
startField <= number <= endField
I used ScriptFilter to do this:
QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );
SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();
It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.
I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:
Using simple term query it takes 1 millisecond:
$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}
} ]
}
}
However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!
$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}
} ]
}
}
What should I use to do "startField <= number <= endField" but much
faster?
I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.
I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).
I would appreciate any help I can get.
Thanks,
Hovanes
My environment details:
ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.
Mapping:
geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}