I am using ES to index and use millions of data, I would like to know the correct way to configure the yml file.
I am using Elastica, the PHP framework.
Here is my elastica elasticsearch.yml :
index.number_of_shards: 2 index.number_of_replicas: 0 # Dont write data to hdd in tests index.store.type: memory # Required plugins plugin.mandatory: mapper-attachments, geocluster-facet, transport-thrift, transport-memcached, image # For bulk tests bulk.udp.enabled: true bulk.udp.bulk_actions: 5 # For script tests script.inline: on script.indexed: on script.engine.groovy.file: on # Disable dynamic memory allocation bootstrap.mlockall: true # Dont accept connections not from localhost #network.host: "127.0.0.1" # Limit threadpool by set number of available processors to 1 # Without this, travis builds will be failed with OutOfMemory error processors: 1 # All nodes will be called Elastica node.name: Elastica # Ports config http.port: 9200 transport.tcp.port: 9300 thrift.port: 9500 memcached.port: 11211 # Added for snapshot tests path.repo: ["/tmp/backups"]
And here is my ES elasticsearch.yml :
index.number_of_shards: 10 index.number_of_replicas: 1 bootstrap.mlockall: true indices.recovery.max_bytes_per_sec: 200mb indices.store.throttle.max_bytes_per_sec : 200mb
And finally in my etc/default/elasticsearch I have these set to :
MAX_OPEN_FILES=65535 MAX_LOCKED_MEMORY=unlimited ES_HEAP_SIZE=2g START_DAEMON=true ES_USER=elasticsearch ES_GROUP=elasticsearch LOG_DIR=/var/log/elasticsearch DATA_DIR=/var/lib/elasticsearch WORK_DIR=/tmp/elasticsearch CONF_DIR=/etc/elasticsearch CONF_FILE=/etc/elasticsearch/elasticsearch.yml RESTART_ON_UPGRADE=true
Heap size is low because the vagrant VM has only 2gb ram, so if I put any more size elasticsearch won't start.
I don't think it's correctly configured, as ES tends to crash often for no reason, especially if I index all the million data, he crashes before he allows me to index all of them.
And in my PHP code, I manually create the index and the type, but I have no idea how to configure clusters/shards... etc in my .yml file to fit my needs of the enormous data.