I used the 4 node in amazon m1.xlarge type. I use the bulk Api and I
use the apache bench and nginx is for client load balancing in front
of es...etc...
Before change the index refresh time, es can index 7000TPS , and after
change the index refresh time it can index 12000TPS.
Consider using m2.4xlarge instances, which have provisioned IOPS. The
faster the disks, the better the performance.
Field cache type (use soft for stability) -> ( performance little
down) => I dont know why? anyone know??
You don't want this. This causes the field data cache to be reloaded
every time you run a query. That's a huge impact on search.
Thread pool number => ( It will be fixed , need to find sweet spot..,
not yet)
Routing ( dont need yet) -> it cause sometimes can be slow ...
because shard don't have similar data , I think it cause unbalanced
data volume
For log data, you probably want to go for index-per-day/week/month/hour
(whatever suits your loads) rather than using routing.
Data are bulk docs. 1 doc is 264byte. one bulk has 300,000request.
I recommend making smaller bulk requests. The entire bulk request has
to be held in memory, which takes away from the memory available for
(eg) searching.
In my tests, the sweet spot for bulk is 1000-5000 docs at a time. The
actual number will depend on your doc size etc, but I found that
performance fell off after a certain size of request.
I will ask one more.
As I know Default setting is ask randomly shard or replicas.
If the specific shard or replica don't reply In that case will es ask
again other shard or replica?
If it is not, In my opinion It will be better ask again shard or other
replica.
If a shard fails, it is removed and reallocated somewhere else. Whether
a search request will retry on another shard, I'm not sure about.
So if we want to prevent it probably is it better set the Get
operation preference _primary?
Yes
In that case we can't use the several replication to search for
performance.
Well, search will only see the document after a refresh anyway, so
this
is probably less important.
You don't understand my question or I didn't understand es working
process.
If we set the preference _primary, all requests are asked primary
shard firstly. in that case don't ask replica for searching.
It means if we use the more replica it can not affect performance.
Using async means that a document is indexed on primary, but the request
returns before we confirm that a document has also been indexed on the
replicas.
So this affects GETing a document. However, search only refreshes once
every second anyway. So if you index a document, then immediately
search on the primary, you probably won't see the doc. So for search,
it really doesn't matter whether you're searching primaries or replicas.
clint
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.