I'm trying to improve the performance of my facet queries. My typical facet
query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the
values of "_tokens._all._facet". It is the field that is used by the facets
all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my
performance? Would Elasticsearch perform a parallel facetting in the data
nodes?
(4) Finally, can you give me advice on how to test if I am having problems
with data access (like I/O blocks)?
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
Facets currently pull all the values for the field into memory (field
cache):
I would make sure you have enough RAM allocated to elasticsearch to hold
all these in memory.
To run through your items:
I don't think re-writing the query will help on the facet side.
Yeah, this is how things work. Just ensure you are using resident cache
type (see link above) so that your facet values stay in memory.
I would ensure that you're machine isn't swapping and especially isn't
swapping elasticsearch (check out details on mlockall here: Elasticsearch Platform — Find real-time answers at scale | Elastic). If
you're good there, you can check out iostat to ensure disk performance is
as expected.
Best Regards,
Paul
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow the
values of "_tokens._all._facet". It is the field that is used by the facets
all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards improve my
performance? Would Elasticsearch perform a parallel facetting in the data
nodes?
(4) Finally, can you give me advice on how to test if I am having problems
with data access (like I/O blocks)?
The time is an average of 5 requests after the first one, so it would be
the warmed case, but I don't know if warmers could help me here: in the
normal scenario, the queries vary a lot, though it is true that I am always
facetting the same field. I also thought that my problem had to do with the
RAM allocated... but then I ran bigdesk and saw that the memory never gets
used above about 60% of the amount I've reserved (currently 6 GiB). I will
try to tweak the field cache.
Thank you very much for the tips about (3) and (4). Maybe there's something
going wrong there!
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
I would ensure that you're machine isn't swapping and especially isn't
swapping elasticsearch (check out details on mlockall here: Elasticsearch Platform — Find real-time answers at scale | Elastic). If
you're good there, you can check out iostat to ensure disk performance is
as expected.
Best Regards,
Paul
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow
the values of "_tokens._all._facet". It is the field that is used by the
facets all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards improve
my performance? Would Elasticsearch perform a parallel facetting in the
data nodes?
(4) Finally, can you give me advice on how to test if I am having
problems with data access (like I/O blocks)?
So, I checked a few things, but I am still lost...
Now I only have one node. I can see with bigdesk that the heap memory never
goes over 5 of the available 6 GiB.
"_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a
fielddata cache of about 500 MiB for the field I'm using in my facet.
Still, when I make the tests, I can see how Elasticsearch performs I/O
operations (I used iotop for this). So, what can I be doing wrong? By the
way, just to be sure, I took out the facets from the query and it lasts a
few milliseconds, so it is really the facetting that's taking so long.
Can someone help me? Paul?
On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río < ariasdelrio@gmail.com> wrote:
Hi, Paul,
thanks for your response
The time is an average of 5 requests after the first one, so it would be
the warmed case, but I don't know if warmers could help me here: in the
normal scenario, the queries vary a lot, though it is true that I am always
facetting the same field. I also thought that my problem had to do with the
RAM allocated... but then I ran bigdesk and saw that the memory never gets
used above about 60% of the amount I've reserved (currently 6 GiB). I will
try to tweak the field cache.
Thank you very much for the tips about (3) and (4). Maybe there's
something going wrong there!
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
I would ensure that you're machine isn't swapping and especially isn't
swapping elasticsearch (check out details on mlockall here: Elasticsearch Platform — Find real-time answers at scale | Elastic). If
you're good there, you can check out iostat to ensure disk performance is
as expected.
Best Regards,
Paul
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow
the values of "_tokens._all._facet". It is the field that is used by the
facets all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards improve
my performance? Would Elasticsearch perform a parallel facetting in the
data nodes?
(4) Finally, can you give me advice on how to test if I am having
problems with data access (like I/O blocks)?
Hey,
I played around with faceting and was able to get very similar results to
yours. Regex vs normal term facet didn't seem to make a big difference. I
didn't do any formal testing, but running a few different types of facet
queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field
So, if you have a specific query or a low cardinality field things should
be fast. Outside of that, I think you're looking at distributing this
across more shards/CPUs to speed this up more.
Best Regards,
Paul
On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río
wrote:
So, I checked a few things, but I am still lost...
Now I only have one node. I can see with bigdesk that the heap memory
never goes over 5 of the available 6 GiB.
"_nodes/stats/indices/fielddata/_tokens._all._facet" tells me that I have a
fielddata cache of about 500 MiB for the field I'm using in my facet.
Still, when I make the tests, I can see how Elasticsearch performs I/O
operations (I used iotop for this). So, what can I be doing wrong? By the
way, just to be sure, I took out the facets from the query and it lasts a
few milliseconds, so it is really the facetting that's taking so long.
Can someone help me? Paul?
On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río < arias...@gmail.com <javascript:>> wrote:
Hi, Paul,
thanks for your response
The time is an average of 5 requests after the first one, so it would be
the warmed case, but I don't know if warmers could help me here: in the
normal scenario, the queries vary a lot, though it is true that I am always
facetting the same field. I also thought that my problem had to do with the
RAM allocated... but then I ran bigdesk and saw that the memory never gets
used above about 60% of the amount I've reserved (currently 6 GiB). I will
try to tweak the field cache.
Thank you very much for the tips about (3) and (4). Maybe there's
something going wrong there!
Best Regards,
Guillermo
On Mon, Sep 9, 2013 at 11:21 PM, ppearcy <ppe...@gmail.com <javascript:>>wrote:
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
I would ensure that you're machine isn't swapping and especially
isn't swapping elasticsearch (check out details on mlockall here: Elasticsearch Platform — Find real-time answers at scale | Elastic). If
you're good there, you can check out iostat to ensure disk performance is
as expected.
Best Regards,
Paul
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow
the values of "_tokens._all._facet". It is the field that is used by the
facets all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards improve
my performance? Would Elasticsearch perform a parallel facetting in the
data nodes?
(4) Finally, can you give me advice on how to test if I am having
problems with data access (like I/O blocks)?
I was doing the same kind of tests, but I found something very interesting:
Even when I set my query to be limited to 0 ( "query": { "filtered": {
"filter": { "limit": { "value": 0 } }, "query": {...} } } ), it still lasts
over 200ms! But if I take out the facets, it just takes 2ms. I am going to
test in a cluster soon. I hope I'll be able to find out what's happenning.
Hey,
I played around with faceting and was able to get very similar results
to yours. Regex vs normal term facet didn't seem to make a big difference.
I didn't do any formal testing, but running a few different types of facet
queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field
So, if you have a specific query or a low cardinality field things should
be fast. Outside of that, I think you're looking at distributing this
across more shards/CPUs to speed this up more.
Best Regards,
Paul
On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río
wrote:
So, I checked a few things, but I am still lost...
Now I only have one node. I can see with bigdesk that the heap memory
never goes over 5 of the available 6 GiB. "_nodes/stats/indices/**fielddata/_tokens._all._facet"
tells me that I have a fielddata cache of about 500 MiB for the field I'm
using in my facet. Still, when I make the tests, I can see how
Elasticsearch performs I/O operations (I used iotop for this). So, what can
I be doing wrong? By the way, just to be sure, I took out the facets from
the query and it lasts a few milliseconds, so it is really the facetting
that's taking so long.
Can someone help me? Paul?
On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río < arias...@gmail.com> wrote:
Hi, Paul,
thanks for your response
The time is an average of 5 requests after the first one, so it would be
the warmed case, but I don't know if warmers could help me here: in the
normal scenario, the queries vary a lot, though it is true that I am always
facetting the same field. I also thought that my problem had to do with the
RAM allocated... but then I ran bigdesk and saw that the memory never gets
used above about 60% of the amount I've reserved (currently 6 GiB). I will
try to tweak the field cache.
Thank you very much for the tips about (3) and (4). Maybe there's
something going wrong there!
Best Regards,
Guillermo
On Mon, Sep 9, 2013 at 11:21 PM, ppearcy ppe...@gmail.com wrote:
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache" somehow
the values of "_tokens._all._facet". It is the field that is used by the
facets all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards
improve my performance? Would Elasticsearch perform a parallel facetting in
the data nodes?
(4) Finally, can you give me advice on how to test if I am having
problems with data access (like I/O blocks)?
I've started a second node in the same server and the performance goes
up... and I think I've discovered why. Elasticsearch is aware that my
server has 8 CPU's, so it reserves a search thread pool of 8 threads. When
I launch the search, only one of these threads is working. But if I start
two nodes, two threads are working. That results in a better usage of the
server's resources. So, having more CPU's helps dealing with more requests,
but it doesn't increase the performance of one single query. Am I right?
On Wed, Sep 11, 2013 at 11:29 AM, Guillermo Arias del Río < ariasdelrio@gmail.com> wrote:
Hi, Paul,
I was doing the same kind of tests, but I found something very
interesting: Even when I set my query to be limited to 0 ( "query": {
"filtered": { "filter": { "limit": { "value": 0 } }, "query": {...} } } ),
it still lasts over 200ms! But if I take out the facets, it just takes 2ms.
I am going to test in a cluster soon. I hope I'll be able to find out
what's happenning.
Hey,
I played around with faceting and was able to get very similar results
to yours. Regex vs normal term facet didn't seem to make a big difference.
I didn't do any formal testing, but running a few different types of facet
queries came to the conclusion that facet run time is basically:
num result docs * num distinct terms for field
So, if you have a specific query or a low cardinality field things should
be fast. Outside of that, I think you're looking at distributing this
across more shards/CPUs to speed this up more.
Best Regards,
Paul
On Tuesday, September 10, 2013 8:23:52 AM UTC-4, Guillermo Arias del Río
wrote:
So, I checked a few things, but I am still lost...
Now I only have one node. I can see with bigdesk that the heap memory
never goes over 5 of the available 6 GiB. "_nodes/stats/indices/**fielddata/_tokens._all._facet"
tells me that I have a fielddata cache of about 500 MiB for the field I'm
using in my facet. Still, when I make the tests, I can see how
Elasticsearch performs I/O operations (I used iotop for this). So, what can
I be doing wrong? By the way, just to be sure, I took out the facets from
the query and it lasts a few milliseconds, so it is really the facetting
that's taking so long.
Can someone help me? Paul?
On Tue, Sep 10, 2013 at 12:04 PM, Guillermo Arias del Río < arias...@gmail.com> wrote:
Hi, Paul,
thanks for your response
The time is an average of 5 requests after the first one, so it would
be the warmed case, but I don't know if warmers could help me here: in the
normal scenario, the queries vary a lot, though it is true that I am always
facetting the same field. I also thought that my problem had to do with the
RAM allocated... but then I ran bigdesk and saw that the memory never gets
used above about 60% of the amount I've reserved (currently 6 GiB). I will
try to tweak the field cache.
Thank you very much for the tips about (3) and (4). Maybe there's
something going wrong there!
Best Regards,
Guillermo
On Mon, Sep 9, 2013 at 11:21 PM, ppearcy ppe...@gmail.com wrote:
I am curious, is this 200ms the first time the query runs of 200ms
everytime? The first one should always be slower and subsequent ones should
be fast once the index is warmed. If you're trying to optimize the warming
case, I recommend using index warmers.
On Monday, September 9, 2013 7:16:22 AM UTC-4, Guillermo Arias del Río
wrote:
Hi,
I'm trying to improve the performance of my facet queries. My typical
facet query looks a bit more complicated (two facets with regex and a more
complex query), but I have reduced it to a very simple example:
As you can see, there are a lot of documents. My index has 23 GiB
size.
This query takes ~ 200ms, but it takes < 5ms without the facet. The
question is: how can I improve its performance? It should be around 10ms...
(1) I am thinking if rewriting my query could improve it.
"_tokens._all._facet" is a non-analyzed string field, whereas
"_tokens._all._text.ngram" is ICU tokenized, ICU folded and ngram analyzed.
There are several values for each document. Is there anything wrong there I
should consider?
(2) I don't know if it could be possible to "index" or "cache"
somehow the values of "_tokens._all._facet". It is the field that is used
by the facets all the time, so it gets constantly accessed.
(3) If I use a cluster, could a high number of nodes and shards
improve my performance? Would Elasticsearch perform a parallel facetting in
the data nodes?
(4) Finally, can you give me advice on how to test if I am having
problems with data access (like I/O blocks)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.