ElasticSearch Performance Issues

I am encountering a performance issue with Elasticsearch in Amazon EC2.
Currently I am getting at most 500 rps using ab - Apache HTTP server
benchmarking tool with a very simple query. Our goal is to get to atleast
1000 rps, but it seems unlikely unless we throw more hardware at it. Any
advice would be greatly appreciated.

Configurations:
8 node cluster - m1.xlarge - 4 are EBS optimized
Default memory(5gbs)
Java version 1.7.0_03
OS Ubuntu 12.04.1

active shards 4
replicas 1

number of documents: 1.5millions
each document has about 400 attributes that are searchable, they are of
varying data types.
here are the analyzers that are applied to all string fields:

{
"settings" : {
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 1
},
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["(=>", ")=>", "[=>", "]=>", "â
¢=>", "®=>" ]
}
},
"analyzer" :{
"default" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"],
"stopwords" : "none"
},
"lowercase_only_alphanum" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"]
}
},
"filter" : {
"pattern_replace": {
"type" : "pattern_replace",
"pattern" : "\xA0",
"replacement" : " "
}
}
}
}
}'

Thanks,

JM

--

Since you are primarily interested in query performance you should optimize
your index e.g.

curl -XPOST "http://$ES_HOST:9200/_optimize?max_num_segments=1"

Have you tried this?

On Mon, Dec 17, 2012 at 2:20 PM, Jason Moore jason.moore89@gmail.comwrote:

I am encountering a performance issue with Elasticsearch in Amazon EC2.
Currently I am getting at most 500 rps using ab - Apache HTTP server
benchmarking tool with a very simple query. Our goal is to get to atleast
1000 rps, but it seems unlikely unless we throw more hardware at it. Any
advice would be greatly appreciated.

Configurations:
8 node cluster - m1.xlarge - 4 are EBS optimized
Default memory(5gbs)
Java version 1.7.0_03
OS Ubuntu 12.04.1

active shards 4
replicas 1

number of documents: 1.5millions
each document has about 400 attributes that are searchable, they are of
varying data types.
here are the analyzers that are applied to all string fields:

{
"settings" : {
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 1
},
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["(=>", ")=>", "[=>", "]=>", "â
¢=>", "®=>" ]
}
},
"analyzer" :{
"default" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"],
"stopwords" : "none"
},
"lowercase_only_alphanum" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"]
}
},
"filter" : {
"pattern_replace": {
"type" : "pattern_replace",
"pattern" : "\xA0",
"replacement" : " "
}
}
}
}
}'

Thanks,

JM

--

--

Thanks, RKM, but yes I have applied this to all nodes.

JM

On Monday, December 17, 2012 4:47:01 PM UTC-6, RKM wrote:

Since you are primarily interested in query performance you should
optimize your index e.g.

curl -XPOST "http://$ES_HOST:9200/_optimize?max_num_segments=1"

Have you tried this?

On Mon, Dec 17, 2012 at 2:20 PM, Jason Moore <jason....@gmail.com<javascript:>

wrote:

I am encountering a performance issue with Elasticsearch in Amazon EC2.
Currently I am getting at most 500 rps using ab - Apache HTTP server
benchmarking tool with a very simple query. Our goal is to get to atleast
1000 rps, but it seems unlikely unless we throw more hardware at it. Any
advice would be greatly appreciated.

Configurations:
8 node cluster - m1.xlarge - 4 are EBS optimized
Default memory(5gbs)
Java version 1.7.0_03
OS Ubuntu 12.04.1

active shards 4
replicas 1

number of documents: 1.5millions
each document has about 400 attributes that are searchable, they are of
varying data types.
here are the analyzers that are applied to all string fields:

{
"settings" : {
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 1
},
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["(=>", ")=>", "[=>", "]=>", "â
¢=>", "®=>" ]
}
},
"analyzer" :{
"default" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"],
"stopwords" : "none"
},
"lowercase_only_alphanum" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"]
}
},
"filter" : {
"pattern_replace": {
"type" : "pattern_replace",
"pattern" : "\xA0",
"replacement" : " "
}
}
}
}
}'

Thanks,

JM

--

--

Yes, well, as you speculate you may simply end up needing more memory. But
first here is a list of "levers". Try them one by one and see which of them
work for you:

  1. mlockall : true
  2. restrict the number of docs returned to 5 or even 1 if possible
  3. mmapFS
  4. up replicas from 4/1 to 4/2, even 4/3. This allows each query to be
    answered by fewer nodes and (if your nodes have spare capacity) it will
    increase throughput

Use bigdesk to understand what your current constraint (memory/cpu/disk io)
is likely to be. Make sure that all nodes are at roughly the same cpu
utilization, i.e. that the query workload is balanced across your cluster.

On Mon, Dec 17, 2012 at 2:58 PM, Jason Moore jason.moore89@gmail.comwrote:

Thanks, RKM, but yes I have applied this to all nodes.

JM

On Monday, December 17, 2012 4:47:01 PM UTC-6, RKM wrote:

Since you are primarily interested in query performance you should
optimize your index e.g.

curl -XPOST "http://$ES_HOST:9200/_**optimize?max_num_segments=1"

Have you tried this?

On Mon, Dec 17, 2012 at 2:20 PM, Jason Moore jason....@gmail.com wrote:

I am encountering a performance issue with Elasticsearch in Amazon EC2.
Currently I am getting at most 500 rps using ab - Apache HTTP server
benchmarking tool with a very simple query. Our goal is to get to atleast
1000 rps, but it seems unlikely unless we throw more hardware at it. Any
advice would be greatly appreciated.

Configurations:
8 node cluster - m1.xlarge - 4 are EBS optimized
Default memory(5gbs)
Java version 1.7.0_03
OS Ubuntu 12.04.1

active shards 4
replicas 1

number of documents: 1.5millions
each document has about 400 attributes that are searchable, they are of
varying data types.
here are the analyzers that are applied to all string fields:

{
"settings" : {
"index" : {
"number_of_shards" : 4,
"number_of_replicas" : 1
},
"analysis" : {
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : ["(=>", ")=>", "[=>", "]=>", "â
** ¢=>", "®=>" ]
}
},
"analyzer" :{
"default" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"],
"stopwords" : "none"
},
"lowercase_only_alphanum" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "pattern_replace"],
"char_filter" : ["my_mapping"]
}
},
"filter" : {
"pattern_replace": {
"type" : "pattern_replace",
"pattern" : "\xA0",
"replacement" : " "
}
}
}
}
}'

Thanks,

JM

--

--

--

It would be nice if you can give us

  • the size of your index, maybe it does not fit into memory
  • the queries you are executing, and how distinct your field values are,
    since faceting, sorting and caching contribute a lot to performance
  • information about what client you are using, since the throughput depends
    on how fast your client can process reults
  • and if all nodes are participating in responding to the client or just a
    single node
  • some monitoring facts about how much of your CPU, RAM, network bandwidth
    is used, to find out what resource is saturated

I assume you set the JVM heap to 5GB? Did you change other JVM settings?

Thanks,

Jörg

--