Query Execution Time, Performance


(Ridvan Gyundogan) #1

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Shay Banon) #2

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #3

Hi Shay, here is the full query, some attribute names changed to
f1...fn.

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Shay Banon) #4

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #5

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution time
of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other 32GB
    RAM
  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat /proc//status
    | grep VmLck - result is: "VmLck: 8864712 kB"
  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it was
6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I really
hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Shay Banon) #6

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is
to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution
time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other
    32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it
was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I
really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #7

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get the
field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like system, is
to
simply use AJAX and multiple search requests, one for each facet, and
display the results for each specific search/facet as them come. Lets see
how this helps things. You can test the same search request with just one
facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement when
compressing the transport, you have a 1gb link. Are the facets big? (you
do
get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet on
the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query execution
time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the other
    32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000 ulimit -l
    unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The memory is
    locked my the elasticsearch. I check this by #cat
    /proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from the
transport.tcp.compress: true option which gave us some 1.5 sec, it
was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but I
really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Otis Gospodnetić) #8

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet or
sort on field Y, all values from Y will be loaded into FC when you do that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get
the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen
when

data is not being indexed, and if not, can you check (I want to get
the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each facet,
and

display the results for each specific search/facet as them come.
Lets see

how this helps things. You can test the same search request with
just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement
when

compressing the transport, you have a 1gb link. Are the facets big?
(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet
on

the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from
the

transport.tcp.compress: true option which gave us some 1.5
sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but
I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #9

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status' > status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet or
sort on field Y, all values from Y will be loaded into FC when you do that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -http://sematext.com/spm

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to get
the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor of
sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned happen
when

data is not being indexed, and if not, can you check (I want to get
the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each facet,
and

display the results for each specific search/facet as them come.
Lets see

how this helps things. You can test the same search request with
just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf improvement
when

compressing the transport, you have a 1gb link. Are the facets big?
(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names changed to
f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats facet
on

the userId field we get some 30% improvement. The userId has 1 mln
different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance tunning of
elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was from
the

transport.tcp.compress: true option which gave us some 1.5
sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting, but
I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Shay Banon) #10

The node stats returns the size of the field cache on each node. What I am
saying is that simply run your searches to warm it and then check perf. In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status' > status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or
sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to
get

the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor
of

sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen

when

data is not being indexed, and if not, can you check (I want to
get

the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each facet,
and

display the results for each specific search/facet as them come.
Lets see

how this helps things. You can test the same search request with
just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement

when

compressing the transport, you have a 1gb link. Are the facets
big?

(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names changed
to

f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats
facet

on

the userId field we get some 30% improvement. The userId has 1
mln

different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance
tunning of

elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was
from

the

transport.tcp.compress: true option which gave us some 1.5
sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting,
but

I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #11

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I am
saying is that simply run your searches to warm it and then check perf. In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere. If it
is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet or
sort
on a given field. If you facet on field X, when you do it for the first
time, all values from X will be loaded into FC. If you then later facet
or
sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want to
get

the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm monitor
of

sematext, or is there other way to do this?

Just execute the query several times, then the facet data will be in
memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen

when

data is not being indexed, and if not, can you check (I want to
get

the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each facet,
and

display the results for each specific search/facet as them come.
Lets see

how this helps things. You can test the same search request with
just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement

when

compressing the transport, you have a 1gb link. Are the facets
big?

(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names changed
to

f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats
facet

on

the userId field we get some 30% improvement. The userId has 1
mln

different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance
tunning of

elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one 16GB the
    other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true. The
memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved was
from

the

transport.tcp.compress: true option which gave us some 1.5
sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management reporting,
but

I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Shay Banon) #12

Where did you see the docs count? Can you share it as well?

On Sat, May 5, 2012 at 7:47 AM, Ridvan Gyundogan ridvansg@gmail.com wrote:

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I
am
saying is that simply run your searches to warm it and then check perf.
In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere.
If it

is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet
or

sort

on a given field. If you facet on field X, when you do it for the
first

time, all values from X will be loaded into FC. If you then later
facet

or

sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want
to

get

the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm
monitor

of

sematext, or is there other way to do this?

Just execute the query several times, then the facet data will
be in

memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen

when

data is not being indexed, and if not, can you check (I want
to

get

the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each
facet,

and

display the results for each specific search/facet as them
come.

Lets see

how this helps things. You can test the same search request
with

just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement

when

compressing the transport, you have a 1gb link. Are the
facets

big?

(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names
changed

to

f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats
facet

on

the userId field we get some 30% improvement. The userId
has 1

mln

different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance
tunning of

elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one
    16GB the

other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true.
The

memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved
was

from

the

transport.tcp.compress: true option which gave us some
1.5

sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management
reporting,

but

I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(Ridvan Gyundogan) #13

Hi Shay,
It is on the pasted json response from the node stats message above.
Or:
#curl -XGET 'http://localhost:9200/_nodes/stats' | tr "," "\n" | grep
total_docs
"total_docs":61440652 (61mln)

After restarting the server, this same curl call is now giving me
result which makes sense:
"total_docs":8652280(8mln)

I have somethilg like 4 mln docs which I check from the Indeces stats:
#curl localhost:9200/_stats |tr "{" "-" | tr "}" "\n" | grep total |
grep docs
,"total":-"docs":-"count":4265820,"deleted":1281712

I index something like 50 docs per second with TTL 1 day.

Kind Regards

On May 9, 11:26 am, Shay Banon kim...@gmail.com wrote:

Where did you see the docs count? Can you share it as well?

On Sat, May 5, 2012 at 7:47 AM, Ridvan Gyundogan ridva...@gmail.com wrote:

Ah yes, I missed the node stat.
Yes I run the query 3-5 times before the test.

Bellow is the info node stat returns me. I am bit surprised by the
"total_docs":61mln,"total_size":"57.5gb". Actually I see only 5 mln
docs in my index?
...
"cache":{"field_evictions":
0,"field_size":"4.6gb","field_size_in_bytes":4991690858,"filter_count":
398,"filter_evictions":
0,"filter_size":"227.5mb","filter_size_in_bytes":238637804},"merges":
{"current":0,"current_docs":
0,"current_size":"0b","current_size_in_bytes":0,"total":
171024,"total_time":"2.2h","total_time_in_millis":8259533,"total_docs":
61440652,"total_size":"57.5gb","total_size_in_bytes":
61796669360},"refresh":{"total":
1592612,"total_time":"4.9h","total_time_in_millis":17825972},"flush":
{"total":1511,"total_time":"1.9h","total_time_in_millis":6935800}}}}}

On May 4, 4:14 pm, Shay Banon kim...@gmail.com wrote:

The node stats returns the size of the field cache on each node. What I
am
saying is that simply run your searches to warm it and then check perf.
In
0.19.4, there will be logs that prints when it gets loaded.

On Thu, May 3, 2012 at 8:55 AM, Ridvan Gyundogan ridva...@gmail.com
wrote:

Hi Otis,
thanks for the hint.
Do you know a API which returns info about the Field Cache status?

I tried things like:
#curl -XGET 'http://localhost:9200/_status'> status.log #grep field
status.log
#curl localhost:9200/_stats > stats.log #grep field stats.log

Or if you have other strategies you can improve performance of a query
with 50 term facets using the API or the SPM at sematext?

On May 3, 8:00 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Ridvan,

I don't think "field cache loading" (time?) is captured anywhere.
If it

is, I'd love to know.
Like Shay said, FC is loaded with field values when you first facet
or

sort

on a given field. If you facet on field X, when you do it for the
first

time, all values from X will be loaded into FC. If you then later
facet

or

sort on field Y, all values from Y will be loaded into FC when you do
that.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm

On Wednesday, May 2, 2012 2:27:30 PM UTC-4, Ridvan Gyundogan wrote:

Which API should I use for the measurements of the "field cache
loading"?

:slight_smile:

On May 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

On Wed, May 2, 2012 at 1:13 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay,

Is the times you mentioned happen when
data is not being indexed, and if not, can you check (I want
to

get

the

field cache loading out of the picture for a sec)?
I don't know how to check those, can I check with the spm
monitor

of

sematext, or is there other way to do this?

Just execute the query several times, then the facet data will
be in

memory. Only once you do that, take measurements.

Thanks for the help and Kind Regards,
Ridvan

On Apr 29, 8:07 pm, Shay Banon kim...@gmail.com wrote:

Those are quite a lot of facets. Is the times you mentioned
happen

when

data is not being indexed, and if not, can you check (I want
to

get

the

field cache loading out of the picture for a sec)?

Also, one thing that I would do, if you have a dashboard like
system, is

to

simply use AJAX and multiple search requests, one for each
facet,

and

display the results for each specific search/facet as them
come.

Lets see

how this helps things. You can test the same search request
with

just one

facet and see how long it takes.

Also, I find it strange that you got such a search perf
improvement

when

compressing the transport, you have a 1gb link. Are the
facets

big?

(you

do

get 50 from each one, still strange though...).

On Fri, Apr 27, 2012 at 11:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Shay, here is the full query, some attribute names
changed

to

f1...fn.
https://gist.github.com/2513068

I the meantime we noticed that if we remove the term_stats
facet

on

the userId field we get some 30% improvement. The userId
has 1

mln

different test values, but only one per document.

Kind Regards

On Apr 27, 1:18 pm, Shay Banon kim...@gmail.com wrote:

Can you share the full search request you execute?

On Thu, Apr 26, 2012 at 11:52 AM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hi Group,
I've read all the info in the net about performance
tunning of

elasticsearch, but still not satisfied from the query
execution

time

of our index.
We have the following:
Hardware:

  • 2 bare metal AMD machines, each 6 core 3Ghz, one
    16GB the

other

32GB

RAM

  • 1GB network hardware, at least 100MB is supported.

Elasticsearch:

  • version 19.1
  • 8GB RAM dedicated to elasticsearch. ulimit -n 100000
    ulimit -l

unlimited ES_HEAP_SIZE=8g, bootstrap.mlockall: true.
The

memory is

locked my the elasticsearch. I check this by #cat
/proc//status

| grep VmLck - result is: "VmLck: 8864712 kB"

  • 2 shards - 1 shard on each server
  • no replicas

Documents:

  • 10mln documents - average size 2 kb
  • each document has 30 string, not_analyzed, not stored
    fields.
  • average field size - 30 chars
  • 5 fields are String arrays - size average 10
  • _all field is disabled
  • _source.compress : true

Query:

  • 30 Facets, no facet filters, start:0 size:10
    Query Execution time: 5 - 9 sec after the first query.

The only query execution time improvement we achieved
was

from

the

transport.tcp.compress: true option which gave us some
1.5

sec, it

was

6.5 - 10 sec before that.

These times are still ok, it is for management
reporting,

but

I

really

hope to be able to improve them.
Anybody got better query time performance, and how?

Kind Regards,
Ridvan


(system) #14