Shard dividing advise


(nitraM) #1

Hello,
im basically about to turn my 4 months testing system into production but the only concern is whether I decide correctly on initial mapping and shard allocation. My plan is to store using logstash logs from approximately 15 sources (applications) from csv into elasticsrarch. Monthly it takes approx 30 (2mil each) mil documents all together. I would like to have it available one year (after just to close it and open when needed).
Im use kibana4 and huge aggregations (unique count etc.) and I dont have extra powerfull cluster (only 70GB RAM, 10 nodes).

My questions:

  1. What would be best setting of shards and replicas for performance for search (to have one index each month with all sources and replicated it 10 times or to have 1 index per source and month and splitted into multiple shards ???)
  2. Im planning to use use doc_values for fields that dont need to be analyzed - can I set doc_values for .RAW field to benefit from analyzed fields and doc_values from non analyzed fields ?
  3. Im planning to have 2 master nodes with 1/2 of RAM to HEAP(4+4 GB) and rest will be slaves with all memory to HEAP (each approx 8-16GB) - Is it suitable?
  4. Logstash will run only on one master, because all the csv will be stored there
  5. It is worthy to deal with removing fields as _path, _souce, _message etc automatically generated by logstash in terms of speed?

Thanks for any kind of advice
nitraM


(Mark Walkom) #2
  1. Better to have multiple shards with a single replica. Have a shard per node.
  2. Yep.
  3. Are your masters holding data?
  4. Ok
  5. Depends. Removing _source means you cannot reindex without the original data.

(nitraM) #3
  1. I see, so when i assume to have 10 nodes in total i will set up on master nide
    shards: 5
    replicas: 1
    and to have single index per month. Splitting indexes for sources will make mo difference?
    logstash-%yy-%mm compares to logstash-%app-%yy-%mm,
    In Kibana using logstash-*
  2. yes master will hold data, or is better not to ?
    Thanks

(Mark Walkom) #4
  1. Don't set that in config, better to define it when you create the index. You can create multiple indices, just don't go crazy.
  2. Better not to, but depends on if you can afford dedicated master nodes.

(system) #5