Index server on remote machine


(IronMike) #1

I would like some clarification on this topic, Assume I am on cloud with
two instances (instance1 & instance2).
instance1 is where I process all my files and make them JSON ready
instance2 is where the actual Elasticsearch server is.

So On Instance1, I can run my application which opens a transport client
with (instance2 IP address), and go through my documents and push them to
BulkProcessor.
My question, wouldn't the bulkprocessor still have to push the data
remotely to instance2 where the actual ES server is, so with each bulk, say
its 1000 docs, it has to push 1000 docs? Isn't this a huge bottle neck?
What's the ideal way for such a scenario? Am I going about this the wrong
way?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cbde68ef-0305-4fbd-afcd-5e1132b73688%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

Correct. You will be network bound when pushing content remotely. However,
how is this different from any storage engine? If Instance2 was running
MySQL, you are still pushing data over the network.

The most efficient solution perhaps is to run your process as a river, but
you lose some flexibility in terms of process control. Overall, I think the
remote indexing process is a good solution.

Cheers,

Ivan

On Thu, Apr 24, 2014 at 11:55 AM, IronMan2014 sabdalla80@gmail.com wrote:

I would like some clarification on this topic, Assume I am on cloud with
two instances (instance1 & instance2).
instance1 is where I process all my files and make them JSON ready
instance2 is where the actual Elasticsearch server is.

So On Instance1, I can run my application which opens a transport client
with (instance2 IP address), and go through my documents and push them to
BulkProcessor.
My question, wouldn't the bulkprocessor still have to push the data
remotely to instance2 where the actual ES server is, so with each bulk, say
its 1000 docs, it has to push 1000 docs? Isn't this a huge bottle neck?
What's the ideal way for such a scenario? Am I going about this the wrong
way?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cbde68ef-0305-4fbd-afcd-5e1132b73688%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/cbde68ef-0305-4fbd-afcd-5e1132b73688%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDjG1in3j6DBBjrfTMEYPox1sFZkp9Bkpk1T%3D19r58D%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(IronMike) #3

Thanks Ivan, Yes it's no different I just wanted to make sure I am going
about it the right way. The only thing that concerns me is the bulks not
keeping up due to networking, pushing and indexing is heavy on I/O, while
storage scenario you mentioned may not be as time sensitive operation (I
think).
I thought about having the index server on the same machine, and just make
the machine to be powerful one, I am just not sure what are the drawbacks
to this? I can think of one, remotely it allows me to setup a cluster of
machines to which I can index to.

On Thursday, April 24, 2014 3:46:59 PM UTC-4, Ivan Brusic wrote:

Correct. You will be network bound when pushing content remotely. However,
how is this different from any storage engine? If Instance2 was running
MySQL, you are still pushing data over the network.

The most efficient solution perhaps is to run your process as a river, but
you lose some flexibility in terms of process control. Overall, I think the
remote indexing process is a good solution.

Cheers,

Ivan

On Thu, Apr 24, 2014 at 11:55 AM, IronMan2014 <sabda...@gmail.com<javascript:>

wrote:

I would like some clarification on this topic, Assume I am on cloud with
two instances (instance1 & instance2).
instance1 is where I process all my files and make them JSON ready
instance2 is where the actual Elasticsearch server is.

So On Instance1, I can run my application which opens a transport client
with (instance2 IP address), and go through my documents and push them to
BulkProcessor.
My question, wouldn't the bulkprocessor still have to push the data
remotely to instance2 where the actual ES server is, so with each bulk, say
its 1000 docs, it has to push 1000 docs? Isn't this a huge bottle neck?
What's the ideal way for such a scenario? Am I going about this the wrong
way?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cbde68ef-0305-4fbd-afcd-5e1132b73688%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/cbde68ef-0305-4fbd-afcd-5e1132b73688%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9a89928b-0a99-4a80-ab43-4604a13eaa1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4