Very slow filter query

Hi everyone,

I'm new to Elasticsearch, so please be forgiving if I'm missing something
obvious. :wink: I'm comparing it with Solr for performance in a job which will
require large Boolean filters. However, my initial tests with trivial
filters have had very poor performance. The index size is 25M records
(65GB, un-optimised). For comparison, a similar Solr index is 102GB.

The following request takes about 300ms uncached, which reduces to about
200ms after repeated requests:

$ curl localhost:9200/core/_search -d'{"filter":{
    "term":{"publication_acronym":"EEN"}},
    "size":0}
'

{"took":182,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":246929,"max_score":1.0,"hits":[]}}

A similar Solr search takes about 900ms at first, then reduces to 2-3ms. So
I'm guessing that ES is not caching the filter effectively(?) I'm giving
both ES and Solr 16GB heap (out of 24GB total).

I'll paste my "core" setup at the end of this message. Does anyone have any
idea of what might be wrong, and how I can fix it?

Many thanks,
Tom

{
"settings" : {
"number_of_shards" : 1
},

"analysis" : {
    "analyzer" : {
        "standard" : { "type" : "standard" },
        "english" : { "type" : "english" }
    }
},

"mappings" : {
    "article" : {
        "properties" : {
            "id" : { "type" : "string", "index" : "not_analyzed" },
            "publication_name" : { "type" : "string", "index" : 

"not_analyzed" },
"publication_acronym" : { "type" : "string", "index" :
"not_analyzed" },
"publication_subsource" : { "type" : "string", "index" :
"not_analyzed" },
"edition" : { "type" : "string", "index" : "not_analyzed" },
"region" : { "type" : "string", "index" : "not_analyzed" },
"day" : { "type" : "string", "index" : "not_analyzed" },
"page_section" : { "type" : "string", "index" :
"not_analyzed" },
"author" : { "type" : "string", "index" : "not_analyzed" },
"author_t" : { "type" : "string", "analyzer" : "standard" },
"headline" : { "type" : "string", "analyzer" : "standard" },
"subheadline" : { "type" : "string", "analyzer" :
"standard" },
"byline" : { "type" : "string", "analyzer" : "standard" },
"caption" : { "type" : "string", "analyzer" : "standard" },
"spellcheck" : { "type" : "string", "analyzer" : "standard"
},
"body" : { "type" : "string", "analyzer" : "english" },
"para1" : { "type" : "string", "analyzer" : "english" },

            "wordcount" : { "type" : "integer" },
            "restriction" : { "type" : "integer" },
            "page_numbers" : { "type" : "integer" },                
            "publication_date" : { "type" : "date" },                
            "in_last_edition" : { "type" : "boolean" }
        }
    }
}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Tom,

Currently you are placing your filter in the top level filter, which acts
like a post filter. If you use the filtered_query and wrap your filter
inside that, I expect a faster execution time:
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/

Martijn

On 24 April 2013 17:33, Tom Mortimer admin@flax.co.uk wrote:

Hi everyone,

I'm new to Elasticsearch, so please be forgiving if I'm missing something
obvious. :wink: I'm comparing it with Solr for performance in a job which will
require large Boolean filters. However, my initial tests with trivial
filters have had very poor performance. The index size is 25M records
(65GB, un-optimised). For comparison, a similar Solr index is 102GB.

The following request takes about 300ms uncached, which reduces to about
200ms after repeated requests:

$ curl localhost:9200/core/_search -d'{"filter":{
    "term":{"publication_acronym":"EEN"}},
    "size":0}
'

{"took":182,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":246929,"max_score":1.0,"hits":[]}}

A similar Solr search takes about 900ms at first, then reduces to 2-3ms.
So I'm guessing that ES is not caching the filter effectively(?) I'm giving
both ES and Solr 16GB heap (out of 24GB total).

I'll paste my "core" setup at the end of this message. Does anyone have
any idea of what might be wrong, and how I can fix it?

Many thanks,
Tom

{
"settings" : {
"number_of_shards" : 1
},

"analysis" : {
    "analyzer" : {
        "standard" : { "type" : "standard" },
        "english" : { "type" : "english" }
    }
},

"mappings" : {
    "article" : {
        "properties" : {
            "id" : { "type" : "string", "index" : "not_analyzed" },
            "publication_name" : { "type" : "string", "index" :

"not_analyzed" },
"publication_acronym" : { "type" : "string", "index" :
"not_analyzed" },
"publication_subsource" : { "type" : "string", "index" :
"not_analyzed" },
"edition" : { "type" : "string", "index" : "not_analyzed"
},
"region" : { "type" : "string", "index" : "not_analyzed" },
"day" : { "type" : "string", "index" : "not_analyzed" },
"page_section" : { "type" : "string", "index" :
"not_analyzed" },
"author" : { "type" : "string", "index" : "not_analyzed" },
"author_t" : { "type" : "string", "analyzer" : "standard"
},
"headline" : { "type" : "string", "analyzer" : "standard"
},
"subheadline" : { "type" : "string", "analyzer" :
"standard" },
"byline" : { "type" : "string", "analyzer" : "standard" },
"caption" : { "type" : "string", "analyzer" : "standard" },
"spellcheck" : { "type" : "string", "analyzer" :
"standard" },
"body" : { "type" : "string", "analyzer" : "english" },
"para1" : { "type" : "string", "analyzer" : "english" },

            "wordcount" : { "type" : "integer" },
            "restriction" : { "type" : "integer" },
            "page_numbers" : { "type" : "integer" },
            "publication_date" : { "type" : "date" },
            "in_last_edition" : { "type" : "boolean" }
        }
    }
}

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Martijn,

Thanks for your reply. However, I'm having trouble getting it working.
Here's the query:

{
"filtered" : {
"query": {"term": {"body": "news"}},
"filter": {"term": {"publication_acronym": "EEN"}}
}
}

and the response:

{"error":"SearchPhaseExecutionException[Failed to execute phase
[query_fetch], total failure; shardFailures
{[8r6jDnnVQ-G5Jc1aGCAuDA][nla][0]: SearchParseException[[nla][0]:
from[-1],size[-1]: Parse Failure [Failed to parse source [{ "filtered"
: { "query": {"term": {"body": "news"}}, "filter":
{"term": {"publication_acronym": "EEN"}} }}]]]; nested:
SearchParseException[[nla][0]: from[-1],size[-1]: Parse Failure [No parser
for element [filtered]]]; }]","status":500}

Could this be because I'm using version 0.90? Should I switch to 0.20.6? I
wanted to use the spelling suggester in 0.90 though..

cheers,
Tom

On Wednesday, April 24, 2013 5:27:41 PM UTC+1, Martijn v Groningen wrote:

Hi Tom,

Currently you are placing your filter in the top level filter, which acts
like a post filter. If you use the filtered_query and wrap your filter
inside that, I expect a faster execution time:
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/

Martijn

On 24 April 2013 17:33, Tom Mortimer <ad...@flax.co.uk <javascript:>>wrote:

Hi everyone,

I'm new to Elasticsearch, so please be forgiving if I'm missing something
obvious. :wink: I'm comparing it with Solr for performance in a job which will
require large Boolean filters. However, my initial tests with trivial
filters have had very poor performance. The index size is 25M records
(65GB, un-optimised). For comparison, a similar Solr index is 102GB.

The following request takes about 300ms uncached, which reduces to about
200ms after repeated requests:

$ curl localhost:9200/core/_search -d'{"filter":{
    "term":{"publication_acronym":"EEN"}},
    "size":0}
'

{"took":182,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":246929,"max_score":1.0,"hits":[]}}

A similar Solr search takes about 900ms at first, then reduces to 2-3ms.
So I'm guessing that ES is not caching the filter effectively(?) I'm giving
both ES and Solr 16GB heap (out of 24GB total).

I'll paste my "core" setup at the end of this message. Does anyone have
any idea of what might be wrong, and how I can fix it?

Many thanks,
Tom

{
"settings" : {
"number_of_shards" : 1
},

"analysis" : {
    "analyzer" : {
        "standard" : { "type" : "standard" },
        "english" : { "type" : "english" }
    }
},

"mappings" : {
    "article" : {
        "properties" : {
            "id" : { "type" : "string", "index" : "not_analyzed" },
            "publication_name" : { "type" : "string", "index" : 

"not_analyzed" },
"publication_acronym" : { "type" : "string", "index" :
"not_analyzed" },
"publication_subsource" : { "type" : "string", "index" :
"not_analyzed" },
"edition" : { "type" : "string", "index" : "not_analyzed"
},
"region" : { "type" : "string", "index" : "not_analyzed"
},
"day" : { "type" : "string", "index" : "not_analyzed" },
"page_section" : { "type" : "string", "index" :
"not_analyzed" },
"author" : { "type" : "string", "index" : "not_analyzed"
},
"author_t" : { "type" : "string", "analyzer" : "standard"
},
"headline" : { "type" : "string", "analyzer" : "standard"
},
"subheadline" : { "type" : "string", "analyzer" :
"standard" },
"byline" : { "type" : "string", "analyzer" : "standard" },
"caption" : { "type" : "string", "analyzer" : "standard"
},
"spellcheck" : { "type" : "string", "analyzer" :
"standard" },
"body" : { "type" : "string", "analyzer" : "english" },
"para1" : { "type" : "string", "analyzer" : "english" },

            "wordcount" : { "type" : "integer" },
            "restriction" : { "type" : "integer" },
            "page_numbers" : { "type" : "integer" },                
            "publication_date" : { "type" : "date" },                
            "in_last_edition" : { "type" : "boolean" }
        }
    }
}

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Should be...

{
"query": {
"filtered" : {
"query": {"term": {"body": "news"}},
"filter": {"term": {"publication_acronym": "EEN"}}
}
}
}

-Eric

On Wednesday, April 24, 2013 3:17:08 PM UTC-4, Tom Mortimer wrote:

Hi Martijn,

Thanks for your reply. However, I'm having trouble getting it working.
Here's the query:

{
"filtered" : {
"query": {"term": {"body": "news"}},
"filter": {"term": {"publication_acronym": "EEN"}}
}
}

and the response:

{"error":"SearchPhaseExecutionException[Failed to execute phase
[query_fetch], total failure; shardFailures
{[8r6jDnnVQ-G5Jc1aGCAuDA][nla][0]: SearchParseException[[nla][0]:
from[-1],size[-1]: Parse Failure [Failed to parse source [{ "filtered"
: { "query": {"term": {"body": "news"}}, "filter":
{"term": {"publication_acronym": "EEN"}} }}]]]; nested:
SearchParseException[[nla][0]: from[-1],size[-1]: Parse Failure [No parser
for element [filtered]]]; }]","status":500}

Could this be because I'm using version 0.90? Should I switch to 0.20.6? I
wanted to use the spelling suggester in 0.90 though..

cheers,
Tom

On Wednesday, April 24, 2013 5:27:41 PM UTC+1, Martijn v Groningen wrote:

Hi Tom,

Currently you are placing your filter in the top level filter, which acts
like a post filter. If you use the filtered_query and wrap your filter
inside that, I expect a faster execution time:
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/

Martijn

On 24 April 2013 17:33, Tom Mortimer ad...@flax.co.uk wrote:

Hi everyone,

I'm new to Elasticsearch, so please be forgiving if I'm missing
something obvious. :wink: I'm comparing it with Solr for performance in a job
which will require large Boolean filters. However, my initial tests with
trivial filters have had very poor performance. The index size is 25M
records (65GB, un-optimised). For comparison, a similar Solr index is 102GB.

The following request takes about 300ms uncached, which reduces to about
200ms after repeated requests:

$ curl localhost:9200/core/_search -d'{"filter":{
    "term":{"publication_acronym":"EEN"}},
    "size":0}
'

{"took":182,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":246929,"max_score":1.0,"hits":[]}}

A similar Solr search takes about 900ms at first, then reduces to 2-3ms.
So I'm guessing that ES is not caching the filter effectively(?) I'm giving
both ES and Solr 16GB heap (out of 24GB total).

I'll paste my "core" setup at the end of this message. Does anyone have
any idea of what might be wrong, and how I can fix it?

Many thanks,
Tom

{
"settings" : {
"number_of_shards" : 1
},

"analysis" : {
    "analyzer" : {
        "standard" : { "type" : "standard" },
        "english" : { "type" : "english" }
    }
},

"mappings" : {
    "article" : {
        "properties" : {
            "id" : { "type" : "string", "index" : "not_analyzed" },
            "publication_name" : { "type" : "string", "index" : 

"not_analyzed" },
"publication_acronym" : { "type" : "string", "index" :
"not_analyzed" },
"publication_subsource" : { "type" : "string", "index" :
"not_analyzed" },
"edition" : { "type" : "string", "index" :
"not_analyzed" },
"region" : { "type" : "string", "index" : "not_analyzed"
},
"day" : { "type" : "string", "index" : "not_analyzed" },
"page_section" : { "type" : "string", "index" :
"not_analyzed" },
"author" : { "type" : "string", "index" : "not_analyzed"
},
"author_t" : { "type" : "string", "analyzer" :
"standard" },
"headline" : { "type" : "string", "analyzer" :
"standard" },
"subheadline" : { "type" : "string", "analyzer" :
"standard" },
"byline" : { "type" : "string", "analyzer" : "standard"
},
"caption" : { "type" : "string", "analyzer" : "standard"
},
"spellcheck" : { "type" : "string", "analyzer" :
"standard" },
"body" : { "type" : "string", "analyzer" : "english" },
"para1" : { "type" : "string", "analyzer" : "english" },

            "wordcount" : { "type" : "integer" },
            "restriction" : { "type" : "integer" },
            "page_numbers" : { "type" : "integer" },                
            "publication_date" : { "type" : "date" },                
            "in_last_edition" : { "type" : "boolean" }
        }
    }
}

}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Awesome. That takes 490ms on the first iteration, and about 10ms
subsequently. Still a bit slower than Solr, but that's in a naive and
unoptimised case. Thanks for the help!

Tom

On Wednesday, April 24, 2013 8:39:05 PM UTC+1, egaumer wrote:

Should be...

{
"query": {
"filtered" : {
"query": {"term": {"body": "news"}},
"filter": {"term": {"publication_acronym": "EEN"}}
}
}
}

-Eric

On Wednesday, April 24, 2013 3:17:08 PM UTC-4, Tom Mortimer wrote:

Hi Martijn,

Thanks for your reply. However, I'm having trouble getting it working.
Here's the query:

{
"filtered" : {
"query": {"term": {"body": "news"}},
"filter": {"term": {"publication_acronym": "EEN"}}
}
}

and the response:

{"error":"SearchPhaseExecutionException[Failed to execute phase
[query_fetch], total failure; shardFailures
{[8r6jDnnVQ-G5Jc1aGCAuDA][nla][0]: SearchParseException[[nla][0]:
from[-1],size[-1]: Parse Failure [Failed to parse source [{ "filtered"
: { "query": {"term": {"body": "news"}}, "filter":
{"term": {"publication_acronym": "EEN"}} }}]]]; nested:
SearchParseException[[nla][0]: from[-1],size[-1]: Parse Failure [No parser
for element [filtered]]]; }]","status":500}

Could this be because I'm using version 0.90? Should I switch to 0.20.6?
I wanted to use the spelling suggester in 0.90 though..

cheers,
Tom

On Wednesday, April 24, 2013 5:27:41 PM UTC+1, Martijn v Groningen wrote:

Hi Tom,

Currently you are placing your filter in the top level filter, which
acts like a post filter. If you use the filtered_query and wrap your filter
inside that, I expect a faster execution time:
http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/

Martijn

On 24 April 2013 17:33, Tom Mortimer ad...@flax.co.uk wrote:

Hi everyone,

I'm new to Elasticsearch, so please be forgiving if I'm missing
something obvious. :wink: I'm comparing it with Solr for performance in a job
which will require large Boolean filters. However, my initial tests with
trivial filters have had very poor performance. The index size is 25M
records (65GB, un-optimised). For comparison, a similar Solr index is 102GB.

The following request takes about 300ms uncached, which reduces to
about 200ms after repeated requests:

$ curl localhost:9200/core/_search -d'{"filter":{
    "term":{"publication_acronym":"EEN"}},
    "size":0}
'

{"took":182,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":246929,"max_score":1.0,"hits":[]}}

A similar Solr search takes about 900ms at first, then reduces to
2-3ms. So I'm guessing that ES is not caching the filter effectively(?) I'm
giving both ES and Solr 16GB heap (out of 24GB total).

I'll paste my "core" setup at the end of this message. Does anyone have
any idea of what might be wrong, and how I can fix it?

Many thanks,
Tom

{
"settings" : {
"number_of_shards" : 1
},

"analysis" : {
    "analyzer" : {
        "standard" : { "type" : "standard" },
        "english" : { "type" : "english" }
    }
},

"mappings" : {
    "article" : {
        "properties" : {
            "id" : { "type" : "string", "index" : "not_analyzed" },
            "publication_name" : { "type" : "string", "index" : 

"not_analyzed" },
"publication_acronym" : { "type" : "string", "index" :
"not_analyzed" },
"publication_subsource" : { "type" : "string", "index"
: "not_analyzed" },
"edition" : { "type" : "string", "index" :
"not_analyzed" },
"region" : { "type" : "string", "index" :
"not_analyzed" },
"day" : { "type" : "string", "index" : "not_analyzed" },
"page_section" : { "type" : "string", "index" :
"not_analyzed" },
"author" : { "type" : "string", "index" :
"not_analyzed" },
"author_t" : { "type" : "string", "analyzer" :
"standard" },
"headline" : { "type" : "string", "analyzer" :
"standard" },
"subheadline" : { "type" : "string", "analyzer" :
"standard" },
"byline" : { "type" : "string", "analyzer" : "standard"
},
"caption" : { "type" : "string", "analyzer" :
"standard" },
"spellcheck" : { "type" : "string", "analyzer" :
"standard" },
"body" : { "type" : "string", "analyzer" : "english" },
"para1" : { "type" : "string", "analyzer" : "english"
},
"wordcount" : { "type" : "integer" },
"restriction" : { "type" : "integer" },
"page_numbers" : { "type" : "integer" },
"publication_date" : { "type" : "date" },

            "in_last_edition" : { "type" : "boolean" }
        }
    }
}

}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple of
runs to allow caching). The filters vary in length, but are essentially of
the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime range) OR
(similar) ...)

I've attached a Solr and an ES example to this post. Solr is giving me an
average response of 0.01s, while ES is 0.5s. I imagine this is due to
filter caching, which in Solr will cache the whole filter expression
results. For the sake of this comparison I tried to do the same in ES by
using the "_cache": true flag on the overall OR list, but I've also tried
using it at various subnodes in the filter (this is why I want to use ES -
for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a heap
size of 16GB (out of 25GB available), and using the default cache settings.
Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can
anyone suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You need to clean up your query. First start with replacing all the
OR(term, term, term, ....) with a single TermsFilter
(http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter/).
TermsFilter will give you much better performance. You should also
use boolean filter vs all the and/or's
(http://www.elasticsearch.org/guide/reference/query-dsl/bool-filter/).
Don't worry about boolean filter not cached, the internal filter
(terms, etc) get cached. Give something like this a try:

boolean
must
terms filter
numeric range
datetime range
should
similar
....

Thanks,
Matt Weber

On Thu, Apr 25, 2013 at 8:05 AM, Tom Mortimer admin@flax.co.uk wrote:

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple of
runs to allow caching). The filters vary in length, but are essentially of
the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime range) OR
(similar) ...)

I've attached a Solr and an ES example to this post. Solr is giving me an
average response of 0.01s, while ES is 0.5s. I imagine this is due to
filter caching, which in Solr will cache the whole filter expression
results. For the sake of this comparison I tried to do the same in ES by
using the "_cache": true flag on the overall OR list, but I've also tried
using it at various subnodes in the filter (this is why I want to use ES -
for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a heap
size of 16GB (out of 25GB available), and using the default cache settings.
Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can anyone
suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I assume you use a single machine for your node, not multi node? Do you
use the default shard number of 5? And replica level 1?

You may notice shorter response time in your special case by using
single shard, no replica, and an optimized index. Comparing ES with Solr
is like comparing apples with oranges unless you know how to switch ES
to a kind of Solr-like "standalone" mode (non-multi-node, no replica, no
shards). And you already know, the whole caching and filter
implementations are different.

Note there are also methods to use the ES index warmer which may help
http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/
since I don't know if you use Solr auto warming.

Are you on Linux 64bit? If you're on Linux 64 bit, using ES index store
type of mmapfs, and bootstrap.mlockall: true is a trick that will help
to keep the index data in main memory as much as possible. If you have a
large heap like 16 GB, watch out for swapping/paging (and GC).

Jörg

Am 25.04.13 17:05, schrieb Tom Mortimer:

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple
of runs to allow caching). The filters vary in length, but are
essentially of the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime
range) OR (similar) ...)

I've attached a Solr and an ES example to this post. Solr is giving me
an average response of 0.01s, while ES is 0.5s. I imagine this is due
to filter caching, which in Solr will cache the whole filter
expression results. For the sake of this comparison I tried to do the
same in ES by using the "_cache": true flag on the overall OR list,
but I've also tried using it at various subnodes in the filter (this
is why I want to use ES - for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a
heap size of 16GB (out of 25GB available), and using the default cache
settings. Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can
anyone suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for all the replies. Unfortunately, nothing is making much of a
difference.

Another odd thing is that this search:

{"query": {"match_all": {}}, "size": 0}

consistently takes 300+ ms

{"took":324,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":25510327,"max_score":1.0,"hits":[]}}

whereas I would expect it to be nearly instant. Does this point to any
problems with my index or configuration?

cheers,
Tom

On Thursday, April 25, 2013 4:39:07 PM UTC+1, Jörg Prante wrote:

I assume you use a single machine for your node, not multi node? Do you
use the default shard number of 5? And replica level 1?

You may notice shorter response time in your special case by using
single shard, no replica, and an optimized index. Comparing ES with Solr
is like comparing apples with oranges unless you know how to switch ES
to a kind of Solr-like "standalone" mode (non-multi-node, no replica, no
shards). And you already know, the whole caching and filter
implementations are different.

Note there are also methods to use the ES index warmer which may help
http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/
since I don't know if you use Solr auto warming.

Are you on Linux 64bit? If you're on Linux 64 bit, using ES index store
type of mmapfs, and bootstrap.mlockall: true is a trick that will help
to keep the index data in main memory as much as possible. If you have a
large heap like 16 GB, watch out for swapping/paging (and GC).

Jörg

Am 25.04.13 17:05, schrieb Tom Mortimer:

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple
of runs to allow caching). The filters vary in length, but are
essentially of the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime 
range) OR (similar) ...) 

I've attached a Solr and an ES example to this post. Solr is giving me
an average response of 0.01s, while ES is 0.5s. I imagine this is due
to filter caching, which in Solr will cache the whole filter
expression results. For the sake of this comparison I tried to do the
same in ES by using the "_cache": true flag on the overall OR list,
but I've also tried using it at various subnodes in the filter (this
is why I want to use ES - for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a
heap size of 16GB (out of 25GB available), and using the default cache
settings. Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can
anyone suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Tom,

Some questions:
What Java version are you using?
Are Solr and ES running on the same box at the same time?
Are you running multiple ES nodes per machines?
On what type of disk are you putting the ES data directory?

In general you shouldn't allocate more than ~50% of your available memory
to the Java heap space. So if you're running one ES node per machine, you
shouldn't allocate more than ~12GB to the Java heap space. The reason for
this is that Lucene relies a lot on the file system cache. If you don't use
facets or sorting by a field, then you can allocate a much smaller amount
of the available memory to ES. In that case the default ES_HEAP_SPACE
(which is 1GB) is enough. You can give it a few GB's more for the filter
cache. By giving less memory to the ES node, the filesystem cache can use
this available memory and speed up your searches in general (even uncached
queries and filters).

Martijn

On 30 April 2013 10:50, Tom Mortimer admin@flax.co.uk wrote:

Thanks for all the replies. Unfortunately, nothing is making much of a
difference.

Another odd thing is that this search:

{"query": {"match_all": {}}, "size": 0}

consistently takes 300+ ms

{"took":324,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":25510327,"max_score":1.0,"hits":[]}}

whereas I would expect it to be nearly instant. Does this point to any
problems with my index or configuration?

cheers,
Tom

On Thursday, April 25, 2013 4:39:07 PM UTC+1, Jörg Prante wrote:

I assume you use a single machine for your node, not multi node? Do you
use the default shard number of 5? And replica level 1?

You may notice shorter response time in your special case by using
single shard, no replica, and an optimized index. Comparing ES with Solr
is like comparing apples with oranges unless you know how to switch ES
to a kind of Solr-like "standalone" mode (non-multi-node, no replica, no
shards). And you already know, the whole caching and filter
implementations are different.

Note there are also methods to use the ES index warmer which may help
http://www.elasticsearch.org/guide/reference/api/admin-
indices-warmers/http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/
since I don't know if you use Solr auto warming.

Are you on Linux 64bit? If you're on Linux 64 bit, using ES index store
type of mmapfs, and bootstrap.mlockall: true is a trick that will help
to keep the index data in main memory as much as possible. If you have a
large heap like 16 GB, watch out for swapping/paging (and GC).

Jörg

Am 25.04.13 17:05, schrieb Tom Mortimer:

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple
of runs to allow caching). The filters vary in length, but are
essentially of the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime
range) OR (similar) ...)

I've attached a Solr and an ES example to this post. Solr is giving me
an average response of 0.01s, while ES is 0.5s. I imagine this is due
to filter caching, which in Solr will cache the whole filter
expression results. For the sake of this comparison I tried to do the
same in ES by using the "_cache": true flag on the overall OR list,
but I've also tried using it at various subnodes in the filter (this
is why I want to use ES - for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a
heap size of 16GB (out of 25GB available), and using the default cache
settings. Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can
anyone suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Martijn,

$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

I'm not running Solr and ES at the same time. Bigdesk reports a pretty low
system load, low heap use and little GC activity (I can screen grab if this
would be helpful). I'm running a single ES node, and the index has one
shard. I've tried setting the heap size to 8, 12 and 16GB, which makes no
noticeable difference. The disk is a local ext4 filesystem (also used for
my Solr tests).

The odd thing about the match_all search is that it's slow even under no
load. How long should a match_all of 25M docs (returning 0) take?

cheers,
Tom

On Tuesday, April 30, 2013 11:43:11 AM UTC+1, Martijn v Groningen wrote:

Hi Tom,

Some questions:
What Java version are you using?
Are Solr and ES running on the same box at the same time?
Are you running multiple ES nodes per machines?
On what type of disk are you putting the ES data directory?

In general you shouldn't allocate more than ~50% of your available memory
to the Java heap space. So if you're running one ES node per machine, you
shouldn't allocate more than ~12GB to the Java heap space. The reason for
this is that Lucene relies a lot on the file system cache. If you don't use
facets or sorting by a field, then you can allocate a much smaller amount
of the available memory to ES. In that case the default ES_HEAP_SPACE
(which is 1GB) is enough. You can give it a few GB's more for the filter
cache. By giving less memory to the ES node, the filesystem cache can use
this available memory and speed up your searches in general (even uncached
queries and filters).

Martijn

On 30 April 2013 10:50, Tom Mortimer <ad...@flax.co.uk <javascript:>>wrote:

Thanks for all the replies. Unfortunately, nothing is making much of a
difference.

Another odd thing is that this search:

{"query": {"match_all": {}}, "size": 0}

consistently takes 300+ ms

{"took":324,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":25510327,"max_score":1.0,"hits":[]}}

whereas I would expect it to be nearly instant. Does this point to any
problems with my index or configuration?

cheers,
Tom

On Thursday, April 25, 2013 4:39:07 PM UTC+1, Jörg Prante wrote:

I assume you use a single machine for your node, not multi node? Do you
use the default shard number of 5? And replica level 1?

You may notice shorter response time in your special case by using
single shard, no replica, and an optimized index. Comparing ES with Solr
is like comparing apples with oranges unless you know how to switch ES
to a kind of Solr-like "standalone" mode (non-multi-node, no replica, no
shards). And you already know, the whole caching and filter
implementations are different.

Note there are also methods to use the ES index warmer which may help
http://www.elasticsearch.org/guide/reference/api/admin-
indices-warmers/http://www.elasticsearch.org/guide/reference/api/admin-indices-warmers/
since I don't know if you use Solr auto warming.

Are you on Linux 64bit? If you're on Linux 64 bit, using ES index store
type of mmapfs, and bootstrap.mlockall: true is a trick that will help
to keep the index data in main memory as much as possible. If you have a
large heap like 16 GB, watch out for swapping/paging (and GC).

Jörg

Am 25.04.13 17:05, schrieb Tom Mortimer:

OK, now I'm trying some more complex filters like I'll need in the
application, and ES is performing much worse than Solr (after a couple
of runs to allow caching). The filters vary in length, but are
essentially of the form:

((term1 OR term2 OR ...) AND (a numeric range) AND (a datetime 
range) OR (similar) ...) 

I've attached a Solr and an ES example to this post. Solr is giving me
an average response of 0.01s, while ES is 0.5s. I imagine this is due
to filter caching, which in Solr will cache the whole filter
expression results. For the sake of this comparison I tried to do the
same in ES by using the "_cache": true flag on the overall OR list,
but I've also tried using it at various subnodes in the filter (this
is why I want to use ES - for the flexibility of filter caching).

The index is 25M docs (65GB for ES, 102GB for Solr). I'm giving each a
heap size of 16GB (out of 25GB available), and using the default cache
settings. Solr version is 4.2.1, ES is 0.90.0RC2.

If I can get ES performing close to Solr, I'd much rather use it. Can
anyone suggest any way to speed it up?

Thanks,
Tom

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Please, avoid running OpenJDK 1.6 - it is nothing but an early OpenJDK 7
with many undiscovered and unresolved bugs (alpha quality), only trimmed
to pass JDK 6 compatibility test.

I recommend to switch to the lastest Java 7 (Oracle version).

Jörg

Am 30.04.13 13:32, schrieb Tom Mortimer:

java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3)
(6b27-1.12.3-0ubuntu1~10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the tip, Jörg. Unfortunately, in this case, it doesn't make any
difference to the ES performance.

Tom

On Tuesday, April 30, 2013 2:10:59 PM UTC+1, Jörg Prante wrote:

Please, avoid running OpenJDK 1.6 - it is nothing but an early OpenJDK 7
with many undiscovered and unresolved bugs (alpha quality), only trimmed
to pass JDK 6 compatibility test.

I recommend to switch to the lastest Java 7 (Oracle version).

Jörg

Am 30.04.13 13:32, schrieb Tom Mortimer:

java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3)
(6b27-1.12.3-0ubuntu1~10.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.