Hello all.
I am using ES to index and use millions of data, I would like to know the correct way to configure the yml file.
I am using Elastica, the PHP framework.
Here is my elastica elasticsearch.yml :
index.number_of_shards: 2
index.number_of_replicas: 0
# Dont write data to hdd in tests
index.store.type: memory
# Required plugins
plugin.mandatory: mapper-attachments, geocluster-facet, transport-thrift, transport-memcached, image
# For bulk tests
bulk.udp.enabled: true
bulk.udp.bulk_actions: 5
# For script tests
script.inline: on
script.indexed: on
script.engine.groovy.file: on
# Disable dynamic memory allocation
bootstrap.mlockall: true
# Dont accept connections not from localhost
#network.host: "127.0.0.1"
# Limit threadpool by set number of available processors to 1
# Without this, travis builds will be failed with OutOfMemory error
processors: 1
# All nodes will be called Elastica
node.name: Elastica
# Ports config
http.port: 9200
transport.tcp.port: 9300
thrift.port: 9500
memcached.port: 11211
# Added for snapshot tests
path.repo: ["/tmp/backups"]
And here is my ES elasticsearch.yml :
index.number_of_shards: 10
index.number_of_replicas: 1
bootstrap.mlockall: true
indices.recovery.max_bytes_per_sec: 200mb
indices.store.throttle.max_bytes_per_sec : 200mb
And finally in my etc/default/elasticsearch I have these set to :
MAX_OPEN_FILES=65535
MAX_LOCKED_MEMORY=unlimited
ES_HEAP_SIZE=2g
START_DAEMON=true
ES_USER=elasticsearch
ES_GROUP=elasticsearch
LOG_DIR=/var/log/elasticsearch
DATA_DIR=/var/lib/elasticsearch
WORK_DIR=/tmp/elasticsearch
CONF_DIR=/etc/elasticsearch
CONF_FILE=/etc/elasticsearch/elasticsearch.yml
RESTART_ON_UPGRADE=true
Heap size is low because the vagrant VM has only 2gb ram, so if I put any more size elasticsearch won't start.
I don't think it's correctly configured, as ES tends to crash often for no reason, especially if I index all the million data, he crashes before he allows me to index all of them.
And in my PHP code, I manually create the index and the type, but I have no idea how to configure clusters/shards... etc in my .yml file to fit my needs of the enormous data.
Thanks!