Query Execution Time, Performance

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Hi Shay, here is the full query, some attribute names changed to
f1...fn.

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is
to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is
to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet or
sort on field Y, all values from Y will be loaded into FC when you do that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen
when
data is not being indexed, and if not, can you check (I want to get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each facet,
and
display the results for each specific search/facet as them come.
Lets see
how this helps things. You can test the same search request with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement
when
compressing the transport, you have a 1gb link. Are the facets big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet
on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from
the
transport.tcp.compress: true option which gave us some 1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status' > status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet or
sort on field Y, all values from Y will be loaded into FC when you do that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen
when
data is not being indexed, and if not, can you check (I want to get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each facet,
and
display the results for each specific search/facet as them come.
Lets see
how this helps things. You can test the same search request with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement
when
compressing the transport, you have a 1gb link. Are the facets big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats facet
on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from
the
transport.tcp.compress: true option which gave us some 1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

The node stats returns the size of the field cache on each node. What I am
saying is that simply run your searches to warm it and then check perf. In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status' > status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or
sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -
Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to
get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor
of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen
when
data is not being indexed, and if not, can you check (I want to
get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each facet,
and
display the results for each specific search/facet as them come.
Lets see
how this helps things. You can test the same search request with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement
when
compressing the transport, you have a 1gb link. Are the facets
big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed
to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats
facet
on
the userId field we get some 30% improvement. The userId has 1
mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance
tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was
from
the
transport.tcp.compress: true option which gave us some 1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting,
but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I am
saying is that simply run your searches to warm it and then check perf. In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or
sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -
Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to
get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor
of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen
when
data is not being indexed, and if not, can you check (I want to
get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each facet,
and
display the results for each specific search/facet as them come.
Lets see
how this helps things. You can test the same search request with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement
when
compressing the transport, you have a 1gb link. Are the facets
big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed
to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats
facet
on
the userId field we get some 30% improvement. The userId has 1
mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance
tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was
from
the
transport.tcp.compress: true option which gave us some 1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting,
but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Where did you see the docs count? Can you share it as well?

On Sat, May 5, 2012 at 7:47 AM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I
am
saying is that simply run your searches to warm it and then check perf.
In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere.
If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet
or
sort
on a given field. If you facet on field X, when you do it for the
first
time, all values from X will be loaded into FC. If you then later
facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -
Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want
to
get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm
monitor
of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will
be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen
when
data is not being indexed, and if not, can you check (I want
to
get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each
facet,
and
display the results for each specific search/facet as them
come.
Lets see
how this helps things. You can test the same search request
with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement
when
compressing the transport, you have a 1gb link. Are the
facets
big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names
changed
to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats
facet
on
the userId field we get some 30% improvement. The userId
has 1
mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance
tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one
    16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true.
    The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved
was
from
the
transport.tcp.compress: true option which gave us some
1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management
reporting,
but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan

Hi Shay,
It is on the pasted json response from the node stats message above.
Or:
#curl -XGET 'http://localhost:9200/_nodes/stats' | tr "," "\n" | grep
total_docs
"total_docs":61440652 (61mln)

After restarting the server, this same curl call is now giving me
result which makes sense:
"total_docs":8652280(8mln)

I have somethilg like 4 mln docs which I check from the Indeces stats:
#curl localhost:9200/_stats |tr "{" "-" | tr "}" "\n" | grep total |
grep docs
,"total":-"docs":-"count":4265820,"deleted":1281712

I index something like 50 docs per second with TTL 1 day.

Kind Regards

On May 9, 11:26 am, Shay Banon kim...@gmail.com wrote:

Where did you see the docs count? Can you share it as well?

On Sat, May 5, 2012 at 7:47 AM, Ridvan Gyundogan ridva...@gmail.com wrote:

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I
am
saying is that simply run your searches to warm it and then check perf.
In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere.
If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet
or
sort
on a given field. If you facet on field X, when you do it for the
first
time, all values from X will be loaded into FC. If you then later
facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / Elasticsearch / HBase -
Sematext Monitoring | Infrastructure Monitoring Service

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want
to
get
the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm
monitor
of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will
be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen
when
data is not being indexed, and if not, can you check (I want
to
get
the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is
to
simply use AJAX and multiple search requests, one for each
facet,
and
display the results for each specific search/facet as them
come.
Lets see
how this helps things. You can test the same search request
with
just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement
when
compressing the transport, you have a 1gb link. Are the
facets
big?
(you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names
changed
to
f1...fn.
Query on the whole index · GitHub

I the meantime we noticed that if we remove the term_stats
facet
on
the userId field we get some 30% improvement. The userId
has 1
mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance
tunning of
elasticsearch, but still not satisfied from the query
execution
time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one
    16GB the
    other
    32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true.
    The
    memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved
was
from
the
transport.tcp.compress: true option which gave us some
1.5
sec, it
was
6.5 - 10 sec before that.

These times are still ok, it is for management
reporting,
but
I
really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan