ScriptFilter very slow, need to do: startField <= number <= endField

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
		QueryBuilders.matchAllQuery(),
		FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=

ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
		.prepareSearch( "geos" )
		.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
		.setQuery( queryBuilder )
		.setSize( 1000 )
		.execute()
		.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d

'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

	} ]
  }
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d

'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

	} ]
  }
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
	properties: {
		startIpNumber: { null_value: 0, type: long },
		endIpNumber: { null_value: 0, type: long },
		region: { type: string },
		postalCode: { type: string },
		areaCode: { type: string },
		longitude: { type: string },
		latitude: { type: string },
		dmaCode: { type: string },
		country: { type: string },
		city: { type: string }
	}
}

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' -d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}

Thanks a lot for your response.

I was thinking of using range query, but wasn't sure how to do it.
After doing further search here I think I got it:

Targetting the following document:

geo: {
	startIpNumber: 6
	endIpNumber: 10
	...
}

I think the following bool query with 2 range queries should work:

"query" : {
    "bool" : {
        "must" : {
            "range" : {
                "startIpNumber" : { "lte" : 7 }
            }
        },
		"must" : {
            "range" : {
                "endIpNumber" : { "gte" : 7 }
            }
		}
    }
}

However this query returns all documents, including:

geo: {
	startIpNumber: 11
	endIpNumber: 15
	...
},
geo: {
	startIpNumber: 16
	endIpNumber: 20
	...
}

Is seems like 2 range queries are OR-ed rather than AND-ed. I thought
by using "must" I will force both of them to be true. No?

Am I doing something wrong?

Any suggestions?

PS.

What is intersting is that when I use Java API, it works as expected
(returning expected 1 document).

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
		.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )
		.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

SearchResponse response = index()
		.getClient()
		.prepareSearch( index().name() )
		.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
		.setQuery( boolQueryBuilder )
		.execute()
		.actionGet();

On Mar 5, 12:38 pm, Shay Banon kim...@gmail.com wrote:

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}]. Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7. Also, it probably make sense to have the range checks as a filter in a filtered query.

On Tuesday, March 6, 2012 at 12:05 AM, Hovanes wrote:

Thanks a lot for your response.

I was thinking of using range query, but wasn't sure how to do it.
After doing further search here I think I got it:

Targetting the following document:

geo: {
startIpNumber: 6
endIpNumber: 10
...
}

I think the following bool query with 2 range queries should work:

"query" : {
    "bool" : {
        "must" : {
            "range" : {
                "startIpNumber" : { "lte" : 7 }
            }
        },

"must" : {
"range" : {
"endIpNumber" : { "gte" : 7 }
}
}
}
}

However this query returns all documents, including:

geo: {
startIpNumber: 11
endIpNumber: 15
...
},
geo: {
startIpNumber: 16
endIpNumber: 20
...
}

Is seems like 2 range queries are OR-ed rather than AND-ed. I thought
by using "must" I will force both of them to be true. No?

Am I doing something wrong?

Any suggestions?

PS.

What is intersting is that when I use Java API, it works as expected
(returning expected 1 document).

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )
.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

SearchResponse response = index()
.getClient()
.prepareSearch( index().name() )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( boolQueryBuilder )
.execute()
.actionGet();

On Mar 5, 12:38 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}].

I see. Fixed it. This seems to be working as expected. Thanks!

"query" : {
    "bool" : {
        "must" : [
            { "range" : { "startIpNumber" : { "lte" : 7 } } },
            { "range" : { "endIpNumber" : { "gte" : 7 } } }
        ]
    }
}

How is this different in Java API? I had before

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()

.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )

.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

It seems this will be wrong then (though it seems to return correct
results). It seems that there must be only 1 must method with 2 range
queries. But I am not sure how to combine

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
        .must( <2 range queries> );

What goes in <2 range queries>? How do I combine them in an array in
Java API?

Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7.

Yes, becasue I am not comparing input to the same field, I am
comparing it to 2 different fields in the document (startIpNumber <=
input <= endIpNumber ).

Also, it probably make sense to have the range checks as a filter in a filtered query.

You mean something like this?

"query" : {
    "filtered" : {
        "query" : {
            <query>
        },
        "filter" : {
            "bool" : {
                "must" : [
                    { "range" : { "startIpNumber" : { "lte" :

1816601601 } } },
{ "range" : { "endIpNumber" : { "gte" :
1816601601 } } }
]
}
}
}
}

What should the be then? match_all? Wouldn't that have the
undesired effect of forcing a check on all the documents instead of
the index?

Thanks again for your help.

On Mar 6, 12:34 am, Shay Banon kim...@gmail.com wrote:

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}]. Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7. Also, it probably make sense to have the range checks as a filter in a filtered query.

On Tuesday, March 6, 2012 at 12:05 AM, Hovanes wrote:

Thanks a lot for your response.

I was thinking of using range query, but wasn't sure how to do it.
After doing further search here I think I got it:

Targetting the following document:

geo: {
startIpNumber: 6
endIpNumber: 10
...
}

I think the following bool query with 2 range queries should work:

"query" : {
    "bool" : {
        "must" : {
            "range" : {
                "startIpNumber" : { "lte" : 7 }
            }
        },

"must" : {
"range" : {
"endIpNumber" : { "gte" : 7 }
}
}
}
}

However this query returns all documents, including:

geo: {
startIpNumber: 11
endIpNumber: 15
...
},
geo: {
startIpNumber: 16
endIpNumber: 20
...
}

Is seems like 2 range queries are OR-ed rather than AND-ed. I thought
by using "must" I will force both of them to be true. No?

Am I doing something wrong?

Any suggestions?

PS.

What is intersting is that when I use Java API, it works as expected
(returning expected 1 document).

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )
.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

SearchResponse response = index()
.getClient()
.prepareSearch( index().name() )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( boolQueryBuilder )
.execute()
.actionGet();

On Mar 5, 12:38 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}

On Tuesday, March 6, 2012 at 11:53 PM, Hovanes wrote:

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}].

I see. Fixed it. This seems to be working as expected. Thanks!

"query" : {
    "bool" : {
        "must" : [
            { "range" : { "startIpNumber" : { "lte" : 7 } } },
            { "range" : { "endIpNumber" : { "gte" : 7 } } }
        ]
    }
}

How is this different in Java API? I had before

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()

.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )

.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

It seems this will be wrong then (though it seems to return correct
results). It seems that there must be only 1 must method with 2 range
queries. But I am not sure how to combine

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
        .must( <2 range queries> );

What goes in <2 range queries>? How do I combine them in an array in
Java API?

Its not different, the Java API builds the correct structure.

Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7.

Yes, becasue I am not comparing input to the same field, I am
comparing it to 2 different fields in the document (startIpNumber <=
input <= endIpNumber ).

Got you, missed that.

Also, it probably make sense to have the range checks as a filter in a filtered query.

You mean something like this?

"query" : {
    "filtered" : {
        "query" : {
            <query>
        },
        "filter" : {
            "bool" : {
                "must" : [
                    { "range" : { "startIpNumber" : { "lte" :

1816601601 } } },
{ "range" : { "endIpNumber" : { "gte" :
1816601601 } } }
]
}
}
}
}

What should the be then? match_all? Wouldn't that have the
undesired effect of forcing a check on all the documents instead of
the index?

If you don't have a query part, then yes, a match_all. It gets optimized in this case into running only on the results of the filters.

Thanks again for your help.

On Mar 6, 12:34 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}]. Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7. Also, it probably make sense to have the range checks as a filter in a filtered query.

On Tuesday, March 6, 2012 at 12:05 AM, Hovanes wrote:

Thanks a lot for your response.

I was thinking of using range query, but wasn't sure how to do it.
After doing further search here I think I got it:

Targetting the following document:

geo: {
startIpNumber: 6
endIpNumber: 10
...
}

I think the following bool query with 2 range queries should work:

"query" : {
    "bool" : {
        "must" : {
            "range" : {
                "startIpNumber" : { "lte" : 7 }
            }
        },

"must" : {
"range" : {
"endIpNumber" : { "gte" : 7 }
}
}
}
}

However this query returns all documents, including:

geo: {
startIpNumber: 11
endIpNumber: 15
...
},
geo: {
startIpNumber: 16
endIpNumber: 20
...
}

Is seems like 2 range queries are OR-ed rather than AND-ed. I thought
by using "must" I will force both of them to be true. No?

Am I doing something wrong?

Any suggestions?

PS.

What is intersting is that when I use Java API, it works as expected
(returning expected 1 document).

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )
.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

SearchResponse response = index()
.getClient()
.prepareSearch( index().name() )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( boolQueryBuilder )
.execute()
.actionGet();

On Mar 5, 12:38 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon)
OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2
Java: 1.6.0_27
5 Shards, 1 Replicas, about 5.5 million documents.

Mapping:

geo: {
properties: {
startIpNumber: { null_value: 0, type: long },
endIpNumber: { null_value: 0, type: long },
region: { type: string },
postalCode: { type: string },
areaCode: { type: string },
longitude: { type: string },
latitude: { type: string },
dmaCode: { type: string },
country: { type: string },
city: { type: string }
}
}

I followed all your suggestions and everything works much faster now.

Thanks a lot for your help.

On Mar 7, 3:31 am, Shay Banon kim...@gmail.com wrote:

On Tuesday, March 6, 2012 at 11:53 PM, Hovanes wrote:

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}].

I see. Fixed it. This seems to be working as expected. Thanks!

"query" : {
    "bool" : {
        "must" : [
            { "range" : { "startIpNumber" : { "lte" : 7 } } },
            { "range" : { "endIpNumber" : { "gte" : 7 } } }
        ]
    }
}

How is this different in Java API? I had before

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()

.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )

.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

It seems this will be wrong then (though it seems to return correct
results). It seems that there must be only 1 must method with 2 range
queries. But I am not sure how to combine

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
        .must( <2 range queries> );

What goes in <2 range queries>? How do I combine them in an array in
Java API?

Its not different, the Java API builds the correct structure.

Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7.

Yes, becasue I am not comparing input to the same field, I am
comparing it to 2 different fields in the document (startIpNumber <=
input <= endIpNumber ).

Got you, missed that.

Also, it probably make sense to have the range checks as a filter in a filtered query.

You mean something like this?

"query" : {
    "filtered" : {
        "query" : {
            <query>
        },
        "filter" : {
            "bool" : {
                "must" : [
                    { "range" : { "startIpNumber" : { "lte" :

1816601601 } } },
{ "range" : { "endIpNumber" : { "gte" :
1816601601 } } }
]
}
}
}
}

What should the be then? match_all? Wouldn't that have the
undesired effect of forcing a check on all the documents instead of
the index?

If you don't have a query part, then yes, a match_all. It gets optimized in this case into running only on the results of the filters.

Thanks again for your help.

On Mar 6, 12:34 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The structure of the bool clause is wrong, you need to have an array of must clauses "must" : [{},{}]. Also, are you sure the range is what you are after, you ask for lessThenEquals 7, and greaterThenEquals 7. Also, it probably make sense to have the range checks as a filter in a filtered query.

On Tuesday, March 6, 2012 at 12:05 AM, Hovanes wrote:

Thanks a lot for your response.

I was thinking of using range query, but wasn't sure how to do it.
After doing further search here I think I got it:

Targetting the following document:

geo: {
startIpNumber: 6
endIpNumber: 10
...
}

I think the following bool query with 2 range queries should work:

"query" : {
    "bool" : {
        "must" : {
            "range" : {
                "startIpNumber" : { "lte" : 7 }
            }
        },

"must" : {
"range" : {
"endIpNumber" : { "gte" : 7 }
}
}
}
}

However this query returns all documents, including:

geo: {
startIpNumber: 11
endIpNumber: 15
...
},
geo: {
startIpNumber: 16
endIpNumber: 20
...
}

Is seems like 2 range queries are OR-ed rather than AND-ed. I thought
by using "must" I will force both of them to be true. No?

Am I doing something wrong?

Any suggestions?

PS.

What is intersting is that when I use Java API, it works as expected
(returning expected 1 document).

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must( QueryBuilders.rangeQuery( "startIpNumber" ).lte( 7 ) )
.must( QueryBuilders.rangeQuery( "endIpNumber" ).gte( 7 ) );

SearchResponse response = index()
.getClient()
.prepareSearch( index().name() )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( boolQueryBuilder )
.execute()
.actionGet();

On Mar 5, 12:38 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

When you have a script filter, that filter will be checked against each doc that needs to be checked against. If you use match_all, it will be checked against all documents in that index. Its a brute force approach, loading the values to memory, and running the scrip against them. You can use range query / filter, in which case they will work against the indexed data.

On Monday, March 5, 2012 at 10:21 PM, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

startField <= number <= endField

I used ScriptFilter to do this:

QueryBuilder queryBuilder = QueryBuilders.filteredQuery(
QueryBuilders.matchAllQuery(),
FilterBuilders.scriptFilter( "doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642 )
.cache( true ) );

SearchResponse response = getClient()
.prepareSearch( "geos" )
.setSearchType( SearchType.DFS_QUERY_THEN_FETCH )
.setQuery( queryBuilder )
.setSize( 1000 )
.execute()
.actionGet();

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

$ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true'-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" : "doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" : 1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

} ]
}
}

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified

...

read more »

Hi,
I am also using Script Filter for Match All Sentence in particular
field..The Filter is working Fine in local.But very slow once i
published..Any suggestion plz..

Sample Query

{
"from" : 0, "size" :10,
"fields" : ["id", "title","status","condition","intervention"],
"query" : {"query_string":{"query": "test"}},
"filter": {
"script": {
"script": "_source.status=="Test match""
}
}
}

Thanks,
Franklin

On Tuesday, March 6, 2012 4:21:04 AM UTC+8, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

    startField <= number <= endField 

I used ScriptFilter to do this:

    QueryBuilder queryBuilder = QueryBuilders.filteredQuery( 
                    QueryBuilders.matchAllQuery(), 
                    FilterBuilders.scriptFilter( 

"doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642
)
.cache( true ) );

    SearchResponse response = getClient() 
                    .prepareSearch( "geos" ) 
                    .setSearchType( SearchType.DFS_QUERY_THEN_FETCH ) 
                    .setQuery( queryBuilder ) 
                    .setSize( 1000 ) 
                    .execute() 
                    .actionGet(); 

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

    $ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' 

-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

            } ] 
      } 
    } 

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

    $ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' 

-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" :
"doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" :
1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

            } ] 
      } 
    } 

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

    ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon) 
    OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2 
    Java: 1.6.0_27 
    5 Shards, 1 Replicas, about 5.5 million documents. 

Mapping:

    geo: { 
            properties: { 
                    startIpNumber: { null_value: 0, type: long }, 
                    endIpNumber: { null_value: 0, type: long }, 
                    region: { type: string }, 
                    postalCode: { type: string }, 
                    areaCode: { type: string }, 
                    longitude: { type: string }, 
                    latitude: { type: string }, 
                    dmaCode: { type: string }, 
                    country: { type: string }, 
                    city: { type: string } 
            } 
    } 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,
Sometimes i need to use "Or" condition also in same query..

Sample Query

{
"from" : 0, "size" :10,
"fields" : ["id", "title","status","condition","intervention"],
"query" : {"query_string":{"query": "test"}},
"filter": {
"script": {
"script": "_source.status=="Test match" ||
_source.status=="Test""
}
}
}

On Thursday, February 7, 2013 4:31:46 PM UTC+8, Maria John Franklin wrote:

Hi,
I am also using Script Filter for Match All Sentence in particular
field..The Filter is working Fine in local.But very slow once i
published..Any suggestion plz..

Sample Query

{
"from" : 0, "size" :10,
"fields" : ["id", "title","status","condition","intervention"],
"query" : {"query_string":{"query": "test"}},
"filter": {
"script": {
"script": "_source.status=="Test match""
}
}
}

Thanks,
Franklin

On Tuesday, March 6, 2012 4:21:04 AM UTC+8, Hovanes wrote:

Hi,

I'd really appreciate some help here.

My problem is very simple I have a documents that have 2 numeric
fields that define startField and endField (none of the ranges in the
documents overlap btw). I need to provide an arbitrary number and
return document(s) that have:

    startField <= number <= endField 

I used ScriptFilter to do this:

    QueryBuilder queryBuilder = QueryBuilders.filteredQuery( 
                    QueryBuilders.matchAllQuery(), 
                    FilterBuilders.scriptFilter( 

"doc['startIpNumber'].value <=
ipNumber && doc['endIpNumber'].value >= ipNumber" )
.addParam( "ipNumber", 1506723642
)
.cache( true ) );

    SearchResponse response = getClient() 
                    .prepareSearch( "geos" ) 
                    .setSearchType( SearchType.DFS_QUERY_THEN_FETCH ) 
                    .setQuery( queryBuilder ) 
                    .setSize( 1000 ) 
                    .execute() 
                    .actionGet(); 

It works, but the performance is terrible. It takes this query to
execute anywhere from 4 to 7 seconds.

I think the the issue is using of match_all query and/or script
filters. The reason I think this is because I can lookup a specific
document really fast:

Using simple term query it takes 1 millisecond:

    $ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' 

-d
'
{
"query" : {
"term" : {
"startIpNumber" : 1816601600
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

            } ] 
      } 
    } 

However if I do "essentially" the same thing using match_all and
script filter it takes 9051 milliseconds!!!

    $ curl -XGET 'http://localhost:9200/geos/geo/_search?pretty=true' 

-d
'
{
"query" : {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"script" : {
"script" :
"doc["startIpNumber"].value == ipNumber",
"params" : { "ipNumber" :
1816601600 }
}
}
}
}
}'
{
"took" : 9051,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "geos",
"_type" : "geo",
"_id" : "nMao4GGgSUuVlEChGoOaRA",
"_score" : 1.0, "_source" : {
"geo" : {
"startIpNumber" : "1816601600",
"endIpNumber" : "1816601855",
"country" : "US",
"region" : "CA",
"city" : "Garden Grove",
"postalCode" : "",
"latitude" : "33.7751",
"longitude" : "-117.9704",
"dmaCode" : "803",
"areaCode" : "714"
}
}

            } ] 
      } 
    } 

What should I use to do "startField <= number <= endField" but much
faster?

I would use 2 term queries inside a boolean query, with term queries
you can't specify >= or <=.

I was thinking of using range query, but I think there I must provide
a range, and it can match that range against a specified field. (I
need to provide a specific value, and it needs to be in the range of 2
fields).

I would appreciate any help I can get.

Thanks,

Hovanes

My environment details:

    ES: 0.16.1 (I know it is old, planning to migrate to 0.18.7 soon) 
    OS: Dev: Windows 7 Pro, Prod/QA/Int/Stg: CintOS 5.2 
    Java: 1.6.0_27 
    5 Shards, 1 Replicas, about 5.5 million documents. 

Mapping:

    geo: { 
            properties: { 
                    startIpNumber: { null_value: 0, type: long }, 
                    endIpNumber: { null_value: 0, type: long }, 
                    region: { type: string }, 
                    postalCode: { type: string }, 
                    areaCode: { type: string }, 
                    longitude: { type: string }, 
                    latitude: { type: string }, 
                    dmaCode: { type: string }, 
                    country: { type: string }, 
                    city: { type: string } 
            } 
    } 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sample Query

{
"from" : 0, "size" :10,
"fields" : ["id", "title","status","condition","intervention"],

"query" : {"query_string":{"query": "test"}},

"filter": {
"script": {
"script": "_source.status=="Test match" ||
_source.status=="Test""
}
}
}

You're using a script query to do text matching? That's wrong :slight_smile:

I'm assuming your 'status' field should be an exact match, ie it is not
"full text", but an exact string (eg 'Foo' != 'foo')

You should set the field to be 'not_analyzed':

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"status" : {
"index" : "not_analyzed",
"type" : "string"
},
"title" : {
"type" : "string"
}
}
}
}
}
'

Then we can index some data:

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1' -d '
{
"status" : "Test match",
"title" : "This is a Test!"
}
'

And do a search, filtering on the exact values 'Test match' or 'Test':

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"status" : [
"Test match",
"Test"
]
}
},
"query" : {
"match" : {
"title" : "test"
}
}
}
}
}
'

{

"hits" : {

"hits" : [

{

"_source" : {

"status" : "Test match",

"title" : "This is a Test!"

},

"_score" : 0.30685282,

"_index" : "test",

"_id" : "baLaG960QBSnAd-EHjqa6A",

"_type" : "test"

}

],

"max_score" : 0.30685282,

"total" : 1

},

"timed_out" : false,

"_shards" : {

"failed" : 0,

"successful" : 5,

"total" : 5

},

"took" : 5

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.