Elasticsearch performance (problem with requests timeout)

I am fairly new to elasticsearch, and I have some questions related to
performance. My use case is an autocomplete system, so I am only
really searching one field, and so far I only have about 1000
documents indexed.

My problem is that es becomes unresponsive (requests timed out) after
only a few search GET requests in quick succession. I am using the
default configuration (running locally on a dev machine), my query is
a text query with phrase_prefix. I was surprised that the system
would become unresponsive with so little load.

Can you guys offer me some tips to improve performance? What am I
doing wrong, or what kind of settings should I be looking at to
improve performance?

I looked at the nodes stats, and in particular, the search time is
1.9s, which is really slow. But the pattern seems to be very quick
responses to a few queries in a short period of time, followed by one
query which times out, etc. Why is this?

indices: {
store: {
size: 343.6kb
size_in_bytes: 351942
}
...
search: {
query_total: 465
query_time: 1.9s
query_time_in_millis: 1912
query_current: 0
fetch_total: 227
fetch_time: 121ms
fetch_time_in_millis: 121
fetch_current: 0
}
cache: {
field_evictions: 0
field_size: 0b
field_size_in_bytes: 0
filter_count: 0
filter_evictions: 0
filter_size: 0b
filter_size_in_bytes: 0
...
}

On Feb 7, 2:11 pm, quain quaintena...@gmail.com wrote:

I am fairly new to elasticsearch, and I have some questions related to
performance. My use case is an autocomplete system, so I am only
really searching one field, and so far I only have about 1000
documents indexed.

My problem is that es becomes unresponsive (requests timed out) after
only a few search GET requests in quick succession. I am using the
default configuration (running locally on a dev machine), my query is
a text query with phrase_prefix. I was surprised that the system
would become unresponsive with so little load.

Can you guys offer me some tips to improve performance? What am I
doing wrong, or what kind of settings should I be looking at to
improve performance?

The quay_time is the total time spent in the query phase. So, avg is: 1912 / 465 = 4.1 milliseconds.

Are you using persistent connections with HTTP (keep alive), or are you opening a new connection each time (and if so, are you sure you close them? The OS will throttle new connections if many are opened in a short period of time.

On Wednesday, February 8, 2012 at 1:13 AM, quain wrote:

I looked at the nodes stats, and in particular, the search time is
1.9s, which is really slow. But the pattern seems to be very quick
responses to a few queries in a short period of time, followed by one
query which times out, etc. Why is this?

indices: {
store: {
size: 343.6kb
size_in_bytes: 351942
}
...
search: {
query_total: 465
query_time: 1.9s
query_time_in_millis: 1912
query_current: 0
fetch_total: 227
fetch_time: 121ms
fetch_time_in_millis: 121
fetch_current: 0
}
cache: {
field_evictions: 0
field_size: 0b
field_size_in_bytes: 0
filter_count: 0
filter_evictions: 0
filter_size: 0b
filter_size_in_bytes: 0
...
}

On Feb 7, 2:11 pm, quain <quaintena...@gmail.com (http://gmail.com)> wrote:

I am fairly new to elasticsearch, and I have some questions related to
performance. My use case is an autocomplete system, so I am only
really searching one field, and so far I only have about 1000
documents indexed.

My problem is that es becomes unresponsive (requests timed out) after
only a few search GET requests in quick succession. I am using the
default configuration (running locally on a dev machine), my query is
a text query with phrase_prefix. I was surprised that the system
would become unresponsive with so little load.

Can you guys offer me some tips to improve performance? What am I
doing wrong, or what kind of settings should I be looking at to
improve performance?

Hello,

I have a few question regarding scoring (tf-idf):

  1. when I execute a DFS Q then F
    a) are the globally calculated TFs cached at each node that would result
    in a more accurate Q then F the next time?

For example,

i) If I were to execute a Q then F, the TFs are local to each node, the
TF results then are inaccurate in the current state of the ES architecture
ii) If I then run a Dfs Q then F, I force recalculation of TFs among all
nodes
ii) If I were to run a Q then F again, are the globally calculated TFs
stored at each node for better search results?

Of course we are concerned with the number of times a term occurs in all
documents.

  1. When looking at a distributed model for ES:
    a) in the presence of a Master -> LoadBalancer -> {nodes}:n
    i) Are intermediate score calculations stored at the LoadBalancer or are
    they propagated up to the Master?
    ii) Are TF and other associated globally computed values for scoring
    distributed among the nodes without touching the LB or Master?
    iii) Are results or values cached at the LB or Master?
    iv) Under what architectural constraints would it be beneficial to
    include a LB?

Thank you,

David

Hi,
Can you check the log at ES_FOLDER/logs/<your_cluter>.log to see what happened? I guess that it might run out of memory, since you said you were running ES with the default configuration.
If that's the case, you need to allocate more memory for ES, using -Xmx parameter similarly as you do for other Java program.

Regards,
LTVP
On Feb 8, 2012, at 6:11 AM, quain wrote:

I am fairly new to elasticsearch, and I have some questions related to
performance. My use case is an autocomplete system, so I am only
really searching one field, and so far I only have about 1000
documents indexed.

My problem is that es becomes unresponsive (requests timed out) after
only a few search GET requests in quick succession. I am using the
default configuration (running locally on a dev machine), my query is
a text query with phrase_prefix. I was surprised that the system
would become unresponsive with so little load.

Can you guys offer me some tips to improve performance? What am I
doing wrong, or what kind of settings should I be looking at to
improve performance?

On Wednesday, February 8, 2012 at 1:52 AM, David Cheperdak wrote:

Hello,

I have a few question regarding scoring (tf-idf):

  1. when I execute a DFS Q then F
    a) are the globally calculated TFs cached at each node that would result
    in a more accurate Q then F the next time?

For example,

i) If I were to execute a Q then F, the TFs are local to each node, the
TF results then are inaccurate in the current state of the ES architecture
ii) If I then run a Dfs Q then F, I force recalculation of TFs among all
nodes
ii) If I were to run a Q then F again, are the globally calculated TFs
stored at each node for better search results?

Of course we are concerned with the number of times a term occurs in all
documents.

The relevant terms that match the query are extracted and aggregated each time, thats the DFS phase. So there is no caching since the query can change as well as the data can change.

  1. When looking at a distributed model for ES:
    a) in the presence of a Master -> LoadBalancer -> {nodes}:n
    i) Are intermediate score calculations stored at the LoadBalancer or are
    they propagated up to the Master?
    ii) Are TF and other associated globally computed values for scoring
    distributed among the nodes without touching the LB or Master?
    iii) Are results or values cached at the LB or Master?
    iv) Under what architectural constraints would it be beneficial to
    include a LB?

There is no "node" caching of results, a request is sent to the shards and executed. There might be some caching on the shard level, for example, when using filters. Note though, filters cache work well with new documents and deletes.

Thank you,

David