What kind of data, what keys and values, are you fetching?
Note, if you have new fields, ES needs time to create the dynamic mapping.
Also, if you index into a new index, ES needs time to create the index.
Another point is the rendezvous, if you start a client, it need time to
connect with all the nodes of the cluster over the network and detect a
master (if it's a node client). 200ms would be very fast for this.
All of this can be prepared beforehand, and then you could fetch data from
postgresql, or from any other sources.
Websockets have nothing to do with the challenge. Websockets are for
bidirectional interaction style, there is no impact on the internals of
bulk indexing.
With the right client, you can already bulk indexing concurrently, with
maximum speed. I admit, except Java JVM based implementations using the
transport layer protocol, I do not know of concurrent bulk indexing
implementations yet.
Jörg
On Mon, Aug 11, 2014 at 2:50 AM, AbhijitPratap Singh <
singhabhijitpratap@gmail.com> wrote:
There is this initial time of obtaining the first batch of 1000 rows from
the result set having lets say 50000 rows after which it just uses cursors
and fetches 1000 rows in batches in no time.
First batch fetch 200ms
Second onwards 30ms
While we dont have our ES cluster on the same node as Postgres so I will
agree that issues of network latency are present. But barring the first
fetch bulk response time from ES is always much more than the time of
fetching 1000 rows from Postgres.
Concurrent push would have been the ideal solution which I agree to. ES
surely can handle a lot of docs concurrently but it increases memory
footprint in our ruby code.
I looked at the websocket transport protocol provided by ES. Can it be of
some help here in this situation???
On Saturday, August 9, 2014 1:15:17 AM UTC+5:30, abhiji...@housing.com
wrote:
Hello everyone,
I wanted to know if it is possible to index the docs through a stream
which pushes data to the Elasticsearch cluster.
Our current problem is to index the huge set of data from Postgres to
Elasticsearch while processing the data in between. We have been able to
stream data out of Postgres enabling us to use constant memory in our Ruby
code but there is a significant delay while posting these docs in batches
to ES through the Bulk API.
I think it would be ideal if there would exist such a mechanism to push
our docs continuously in the ES cluster thereby reducing the bottleneck
currently created by the bulk call.
Also I would ideally have wanted to post all the batches of docs in a
different thread but that will create memory issues so I though streaming
to be a good alternative.
I apologise for the fact that this is a kind of subjective question but
please do ask for the code if you want to know something.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d0dc7c5d-515f-4efd-8c13-1ec0c5b3da3b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d0dc7c5d-515f-4efd-8c13-1ec0c5b3da3b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE83CHauxRow%2B6doHM1F%3D8x6wRBeCf0Umn2ZPwtWQTr0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.