Azure Windows VM Multiple Node Cluster


(Ashwin Sathya) #1

Below is the discussion thread with David.

I have a few follow up questions, so thought it is better to post in the mail group as per his suggestion.

Please help me understand/design for my scenario with the topology of indices.

I am looking at various data sources for my users (call them user1, user2, user3.. source1, source2, ..)

Now from my understanding of the topology of elastic search it
makes sense to have the users map to individual indices, and have the
sources map to corresponding types. Now my question is as follows.

The data I am looking at from these sources is chronological and
search ranges over a timestamp, where i have to aggregate data from the
sources for a given user. From this requirement, it makes sense if i
map the time as a split (either in days, hours, etc..) and assign them
to types, so my new design model would become.

user1 -> source1_T1, source2_T2, source1_T2.

...

...

Is there a better design, how do people solve this problem ? Is
there a bottleneck on number of types loaded under a particular index on
the memory ? (i.e I don't want to load the complete index to just
search data within a day)

Hey Ashwin,

Answers inline

--

David Pilato | Technical Advocate | Elasticsearch.com

@dadoonet | @elasticsearchfr | @scrutmydocs

Le 6 sept. 2013 à 09:05, R Ashwin Sathya ashwin.sathya@outlook.com a écrit :

Hi dadoonet,

I was the determinant_ in IRC. Thanks a lot for your guidance in setting up elastic search on azure.

FYI, I was able to successfully setup a 3-node cluster in azure
using Windows VMs. However, i had to hack through the firewall to let
them talk to each other, so I thought I would understand the internals a
bit deeper.

I think all your instances are not behind the same end point, right?

What gives the following command?

azure vm list

What ports/communication do the nodes use for communication ? Is this the channel used for replication of data ?

The reason I am asking the first question is to identify a minimal
set of ports to enable in the firewall that way I can create an
automated solution.

9300-9399. If you have only one node per machine (recommended), then only 9300 (default)

I am also looking into Java Service Wrapper/ESWrapper for making
ES run as a service so that azure related restarts are taken care of.

Sadly on 64bits instances, JSW is not free. Have a look at: https://github.com/elasticsearch/elasticsearch-servicewrapper
You
have to know that we are working on building an
elasticsearch-install.exe file (or something) which will simplify this
and will help a lot to run ES as a service on windows.

I don't have any ETA on that.

Hope this helps

BTW, you should ask your questions on the mailing list as it could help other users who are looking for the same information.

Feel free to copy this thread there.

David.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Not sure I fully understand your use case here.
BTW your Thread title is not related to your question. You should open a new thread.

That said, I don't think you should have one index per user. But a global index for all users.
Then use routing feature to route user's documents to the same shard.

You can use on top of that an alias per user which hold the routing key for you.
Question is then, do I need to split my index per period. It depends IMHO if you want to clean old data or not (think rolling indexes).

Note that the routing key could be a concatenation between user and date for example.

Does it help?
If not, could you give more details about your documents and what problem you are trying to solve (in another thread :slight_smile: ).

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 6 sept. 2013 à 15:07, R Ashwin Sathya ashwin.sathya@outlook.com a écrit :

Below is the discussion thread with David.
I have a few follow up questions, so thought it is better to post in the mail group as per his suggestion.

Please help me understand/design for my scenario with the topology of indices.

I am looking at various data sources for my users (call them user1, user2, user3.. source1, source2, ..)

Now from my understanding of the topology of elastic search it makes sense to have the users map to individual indices, and have the sources map to corresponding types. Now my question is as follows.

The data I am looking at from these sources is chronological and search ranges over a timestamp, where i have to aggregate data from the sources for a given user. From this requirement, it makes sense if i map the time as a split (either in days, hours, etc..) and assign them to types, so my new design model would become.

user1 -> source1_T1, source2_T2, source1_T2.
...
...

Is there a better design, how do people solve this problem ? Is there a bottleneck on number of types loaded under a particular index on the memory ? (i.e I don't want to load the complete index to just search data within a day)

Hey Ashwin,

Answers inline

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 6 sept. 2013 à 09:05, R Ashwin Sathya ashwin.sathya@outlook.com a écrit :

Hi dadoonet,

I was the determinant_ in IRC. Thanks a lot for your guidance in setting up elastic search on azure.
FYI, I was able to successfully setup a 3-node cluster in azure using Windows VMs. However, i had to hack through the firewall to let them talk to each other, so I thought I would understand the internals a bit deeper.

I think all your instances are not behind the same end point, right?
What gives the following command?
azure vm list

What ports/communication do the nodes use for communication ? Is this the channel used for replication of data ?
The reason I am asking the first question is to identify a minimal set of ports to enable in the firewall that way I can create an automated solution.

9300-9399. If you have only one node per machine (recommended), then only 9300 (default)

I am also looking into Java Service Wrapper/ESWrapper for making ES run as a service so that azure related restarts are taken care of.

Sadly on 64bits instances, JSW is not free. Have a look at:https://github.com/elasticsearch/elasticsearch-servicewrapper
You have to know that we are working on building an elasticsearch-install.exe file (or something) which will simplify this and will help a lot to run ES as a service on windows.
I don't have any ETA on that.

Hope this helps
BTW, you should ask your questions on the mailing list as it could help other users who are looking for the same information.

Feel free to copy this thread there.

David.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email toelasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ashwin Sathya) #3

Sorry. Will close this and create another thread.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Azure-Windows-VM-Multiple-Node-Cluster-tp4040731p4040743.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ashwin Sathya) #4

Apologies. N00b to mailing lists here.
I will clear this and create a new thread.

Thanks,
Ashwin Sathya

From: david@pilato.fr
Subject: Re: Azure Windows VM Multiple Node Cluster
Date: Fri, 6 Sep 2013 15:38:40 +0200
To: elasticsearch@googlegroups.com

Not sure I fully understand your use case here.BTW your Thread title is not related to your question. You should open a new thread.
That said, I don't think you should have one index per user. But a global index for all users.Then use routing feature to route user's documents to the same shard.
You can use on top of that an alias per user which hold the routing key for you.Question is then, do I need to split my index per period. It depends IMHO if you want to clean old data or not (think rolling indexes).
Note that the routing key could be a concatenation between user and date for example.
Does it help?If not, could you give more details about your documents and what problem you are trying to solve (in another thread :slight_smile: ).

--
David Pilato | Technical Advocate | Elasticsearch.com@dadoonet | @elasticsearchfr | @scrutmydocs

Le 6 sept. 2013 à 15:07, R Ashwin Sathya ashwin.sathya@outlook.com a écrit :Below is the discussion thread with David.
I have a few follow up questions, so thought it is better to post in the mail group as per his suggestion.

Please help me understand/design for my scenario with the topology of indices.

I am looking at various data sources for my users (call them user1, user2, user3.. source1, source2, ..)

Now from my understanding of the topology of elastic search it makes sense to have the users map to individual indices, and have the sources map to corresponding types. Now my question is as follows.

The data I am looking at from these sources is chronological and search ranges over a timestamp, where i have to aggregate data from the sources for a given user. From this requirement, it makes sense if i map the time as a split (either in days, hours, etc..) and assign them to types, so my new design model would become.

user1 -> source1_T1, source2_T2, source1_T2.
...
...

Is there a better design, how do people solve this problem ? Is there a bottleneck on number of types loaded under a particular index on the memory ? (i.e I don't want to load the complete index to just search data within a day)

Hey Ashwin,

Answers inline

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 6 sept. 2013 à 09:05, R Ashwin Sathya ashwin.sathya@outlook.com a écrit :

Hi dadoonet,

I was the determinant_ in IRC. Thanks a lot for your guidance in setting up elastic search on azure.
FYI, I was able to successfully setup a 3-node cluster in azure using Windows VMs. However, i had to hack through the firewall to let them talk to each other, so I thought I would understand the internals a bit deeper.

I think all your instances are not behind the same end point, right?
What gives the following command?
azure vm list

What ports/communication do the nodes use for communication ? Is this the channel used for replication of data ?
The reason I am asking the first question is to identify a minimal set of ports to enable in the firewall that way I can create an automated solution.

9300-9399. If you have only one node per machine (recommended), then only 9300 (default)

I am also looking into Java Service Wrapper/ESWrapper for making ES run as a service so that azure related restarts are taken care of.

Sadly on 64bits instances, JSW is not free. Have a look at:https://github.com/elasticsearch/elasticsearch-servicewrapper
You have to know that we are working on building an elasticsearch-install.exe file (or something) which will simplify this and will help a lot to run ES as a service on windows.
I don't have any ETA on that.

Hope this helps
BTW, you should ask your questions on the mailing list as it could help other users who are looking for the same information.

Feel free to copy this thread there.

David.--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email toelasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5