Shard dividing advise

nitram · July 28, 2015, 5:06pm

Hello,
im basically about to turn my 4 months testing system into production but the only concern is whether I decide correctly on initial mapping and shard allocation. My plan is to store using logstash logs from approximately 15 sources (applications) from csv into elasticsrarch. Monthly it takes approx 30 (2mil each) mil documents all together. I would like to have it available one year (after just to close it and open when needed).
Im use kibana4 and huge aggregations (unique count etc.) and I dont have extra powerfull cluster (only 70GB RAM, 10 nodes).

My questions:

What would be best setting of shards and replicas for performance for search (to have one index each month with all sources and replicated it 10 times or to have 1 index per source and month and splitted into multiple shards ???)
Im planning to use use doc_values for fields that dont need to be analyzed - can I set doc_values for .RAW field to benefit from analyzed fields and doc_values from non analyzed fields ?
Im planning to have 2 master nodes with 1/2 of RAM to HEAP(4+4 GB) and rest will be slaves with all memory to HEAP (each approx 8-16GB) - Is it suitable?
Logstash will run only on one master, because all the csv will be stored there
It is worthy to deal with removing fields as _path, _souce, _message etc automatically generated by logstash in terms of speed?

Thanks for any kind of advice
nitraM

warkolm · July 28, 2015, 10:09pm

Better to have multiple shards with a single replica. Have a shard per node.
Yep.
Are your masters holding data?
Ok
Depends. Removing _source means you cannot reindex without the original data.

nitram · July 29, 2015, 5:41am

I see, so when i assume to have 10 nodes in total i will set up on master nide
shards: 5
replicas: 1
and to have single index per month. Splitting indexes for sources will make mo difference?
logstash-%yy-%mm compares to logstash-%app-%yy-%mm,
In Kibana using logstash-*
yes master will hold data, or is better not to ?
Thanks

warkolm · July 29, 2015, 5:48am

Don't set that in config, better to define it when you create the index. You can create multiple indices, just don't go crazy.
Better not to, but depends on if you can afford dedicated master nodes.

Topic		Replies	Views
How many Shards / Replicas Elasticsearch	9	9824	July 5, 2017
Setting up elasticsearch to scale: shards per index Elasticsearch	9	480	July 6, 2017
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
Analyzing logs and document limit per shard Elasticsearch	11	1285	February 21, 2017
Balance between number of indices and shards per index Elasticsearch	2	454	July 6, 2017

Shard dividing advise

Related topics