Help configuring a multi-process ES setup

Jesus_Sanz_Marcos · August 8, 2012, 4:35pm

Hi,

I am trying to build a ES based service. The idea is to have 16 ES
processes in the same machine (one per core) that do not communicate with
each other.

I will redirect the PUT inserts to the 16 servers using the first letter of
the md5 of {index/type/id} so I can have 16 independent ingestion
processes.

I will take care of the mapReduce operation in an upper layer.

The question is how must I configure each single ES service so they don't
speak with each other and they don't share/shard the documents.

Thank you in advance and best regards,

Jesús

dadoonet · August 8, 2012, 4:54pm

You can disable multicast. See comments in elasticsearch.yml file.

HTH
David
Twitter : @dadoonet / @elasticsearchfr

Le 8 août 2012 à 18:35, Jesús Sanz Marcos jesus@estudiocotacero.es a écrit :

Hi,

I am trying to build a ES based service. The idea is to have 16 ES processes in the same machine (one per core) that do not communicate with each other.

I will redirect the PUT inserts to the 16 servers using the first letter of the md5 of {index/type/id} so I can have 16 independent ingestion processes.

I will take care of the mapReduce operation in an upper layer.

The question is how must I configure each single ES service so they don't speak with each other and they don't share/shard the documents.

Thank you in advance and best regards,

Jesús

olof · August 9, 2012, 7:48am

Couldn't he also give them different cluster names? Though I suppose
disabling the discovery has the same effect.

Den onsdagen den 8:e augusti 2012 kl. 18:54:28 UTC+2 skrev David Pilato:

You can disable multicast. See comments in elasticsearch.yml file.

HTH
David
Twitter : @dadoonet / @elasticsearchfr

Le 8 août 2012 à 18:35, Jesús Sanz Marcos <je...@estudiocotacero.es<javascript:>>
a écrit :

Hi,

I am trying to build a ES based service. The idea is to have 16 ES
processes in the same machine (one per core) that do not communicate with
each other.

I will redirect the PUT inserts to the 16 servers using the first letter
of the md5 of {index/type/id} so I can have 16 independent ingestion
processes.

I will take care of the mapReduce operation in an upper layer.

The question is how must I configure each single ES service so they
don't speak with each other and they don't share/shard the documents.

Thank you in advance and best regards,

Jesús

dadoonet · August 9, 2012, 8:23am

Sure he can. But if you have on the same LAN different versions of ES, let's say
0.17.5 and 0.19.8, you will have often exception because nodes try to connect to
each other but they can't.

I have already seen it in the past so I often disable multicast and define all
my nodes in the unicast config.

My 2 cents
David.

Le 9 août 2012 à 09:48, olof onnilsson@gmail.com a écrit :

Couldn't he also give them different cluster names? Though I suppose disabling
the discovery has the same effect.

Den onsdagen den 8:e augusti 2012 kl. 18:54:28 UTC+2 skrev David Pilato:

You can disable multicast. See comments in elasticsearch.yml file.

HTH
David
Twitter : @dadoonet / @elasticsearchfr

Le 8 août 2012 à 18:35, Jesús Sanz Marcos < je...@estudiocotacero.es> a
écrit :

Hi,

I am trying to build a ES based service. The idea is to have 16 ES
processes in the same machine (one per core) that do not communicate
with each other.

I will redirect the PUT inserts to the 16 servers using the first
letter of the md5 of {index/type/id} so I can have 16 independent
ingestion processes.

I will take care of the mapReduce operation in an upper layer.

The question is how must I configure each single ES service so they
don't speak with each other and they don't share/shard the documents.

Thank you in advance and best regards,

Jesús

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Jesus_Sanz_Marcos · August 9, 2012, 10:07am

Hi!
Thanks for your answers. I have configured the instances with separate folders and binaries, with configuration file like:

http.port: 90XX. /// from 9000 to 9015
index.number_of_shards: 1
discovery.zen.ping.multicast.enabled: false

But I still find documents from other instances. So if I PUT a doc in instance at port 9010, I can find it when I POST a search to instance at port 9005, for example. So servers are still speaking to each other...

Thanks again

Jesús

Jesus_Sanz_Marcos · August 9, 2012, 10:07am

Hi!
Thanks for your answers. I have configured the instances with separate folders and binaries, with configuration file like:

http.port: 90XX. /// from 9000 to 9015
index.number_of_shards: 1
discovery.zen.ping.multicast.enabled: false

But I still find documents from other instances. So if I PUT a doc in instance at port 9010, I can find it when I POST a search to instance at port 9005, for example. So servers are still speaking to each other...

Thanks again

Jesús

dadoonet · August 9, 2012, 11:24am

Did you delete all the previous documents and reindex everything?

Le 9 août 2012 à 12:07, "Jesús Sanz Marcos" jesus@estudiocotacero.es a écrit :

Hi!
Thanks for your answers. I have configured the instances with separate folders
and binaries, with configuration file like:

http.port: 90XX. /// from 9000 to 9015
index.number_of_shards: 1
discovery.zen.ping.multicast.enabled: false

But I still find documents from other instances. So if I PUT a doc in instance
at port 9010, I can find it when I POST a search to instance at port 9005, for
example. So servers are still speaking to each other...

Thanks again

Jesús

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Jesus_Sanz_Marcos · August 9, 2012, 6:16pm

Hi,

I had removed the data folders but I did re-install all the instances from
the tar.gz file using the following configuration.

http.port: 9013
cluster.name: elasticsearch_9013
index.number_of_shards: 1
discovery.zen.ping.multicast.enabled: false

And now the documents are isolated in each instance. THANKS

The EC2 machine has 8 cores and I am using 16 ES instances. Each instance
is receiving PUT requests from a separate c++ application running in the
same machine. It hardly gets to 25% of CPU. Do you think that the I/O is
the bottleneck here or I could send more requests in parallel to each
instance. Have you ever seen ES at 100% during ingestion?

Thanks a lot!