We have an ElasticSearch cluster that has clients that aren't in the
same LAN. Because of this there is a considerate delay until the
client receives the response.
The index we have uses the default file system based storage. Is it a
mechanism of caching the results of the queries (not filters) . I
think this may improve the performance.
Another option would be having the clients and the cluster in the same
LAN. For this is it possible to take the index from our cluster (the
index directories created) and move it on another ES cluster that will
be able to automatically detect the index? I think this is a nice
feature given the fact that Lucene index directory is platform
independent. So if one needs the index on another cluster instead of
scanning the old index and reindexing it on the new cluster, it only
has to copy some index directories on the new cluster. Although even
this approach isn't perfect because we have to gather the complete
index structure from all the machines in the cluster.
We have an Elasticsearch cluster that has clients that aren't in the
same LAN. Because of this there is a considerate delay until the
client receives the response.
The index we have uses the default file system based storage. Is it a
mechanism of caching the results of the queries (not filters) . I
think this may improve the performance.
Another option would be having the clients and the cluster in the same
LAN. For this is it possible to take the index from our cluster (the
index directories created) and move it on another ES cluster that will
be able to automatically detect the index? I think this is a nice
feature given the fact that Lucene index directory is platform
independent. So if one needs the index on another cluster instead of
scanning the old index and reindexing it on the new cluster, it only
has to copy some index directories on the new cluster. Although even
this approach isn't perfect because we have to gather the complete
index structure from all the machines in the cluster.
Yes, the performance in the LAN is acceptable. I think that indeed the
connection is the problem, because is too slow. Anyway, that's why, for this
case I need a way to export one index from one cluster to another, given the
fact that I cannot index directly on the cluster where is the client (I
cannot give more details) and I don't want to scan the index and reindex it
on the new cluster.
Also I have another questions: which is the best way of a remote client
connecting to an elasticsearch cluster: the TransportClient or the REST API?
If one uses the TransportClient will be the connections kept opened all the
session, is this a good idea?, is there a connection pool? how many
connections it will be, one for each node in the cluster if I add them all
to the list of transport addresses?
We have an Elasticsearch cluster that has clients that aren't in the
same LAN. Because of this there is a considerate delay until the
client receives the response.
The index we have uses the default file system based storage. Is it a
mechanism of caching the results of the queries (not filters) . I
think this may improve the performance.
Another option would be having the clients and the cluster in the same
LAN. For this is it possible to take the index from our cluster (the
index directories created) and move it on another ES cluster that will
be able to automatically detect the index? I think this is a nice
feature given the fact that Lucene index directory is platform
independent. So if one needs the index on another cluster instead of
scanning the old index and reindexing it on the new cluster, it only
has to copy some index directories on the new cluster. Although even
this approach isn't perfect because we have to gather the complete
index structure from all the machines in the cluster.
Yes, the performance in the LAN is acceptable. I think that indeed the
connection is the problem, because is too slow. Anyway, that's why, for this
case I need a way to export one index from one cluster to another, given the
fact that I cannot index directly on the cluster where is the client (I
cannot give more details) and I don't want to scan the index and reindex it
on the new cluster.
You can use the scan API to index data from one index to another.
Also I have another questions: which is the best way of a remote client
connecting to an elasticsearch cluster: the TransportClient or the REST API?
If one uses the TransportClient will be the connections kept opened all the
session, is this a good idea?, is there a connection pool? how many
connections it will be, one for each node in the cluster if I add them all
to the list of transport addresses?
The TransportClient manages its own connections, and yes, keeps them open.
We have an Elasticsearch cluster that has clients that aren't in the
same LAN. Because of this there is a considerate delay until the
client receives the response.
The index we have uses the default file system based storage. Is it a
mechanism of caching the results of the queries (not filters) . I
think this may improve the performance.
Another option would be having the clients and the cluster in the same
LAN. For this is it possible to take the index from our cluster (the
index directories created) and move it on another ES cluster that will
be able to automatically detect the index? I think this is a nice
feature given the fact that Lucene index directory is platform
independent. So if one needs the index on another cluster instead of
scanning the old index and reindexing it on the new cluster, it only
has to copy some index directories on the new cluster. Although even
this approach isn't perfect because we have to gather the complete
index structure from all the machines in the cluster.
I have question regarding TransportClient. How many connections does the TransportClient manage? Can I set up the number of connections?
I am thinking if it is reasonable to create my own TransportClientConnectionPool or it is sufficient to leave it on TransportClient. Due to performance issue.
There is no need for a connection pool with TransportClient, it can safely (and optimally) be used by multiple threads concurrently.
On Tuesday, February 28, 2012 at 11:53 AM, Marian wrote:
Hello,
I have question regarding TransportClient. How many connections does the
TransportClient manage? Can I set up the number of connections?
I am thinking if it is reasonable to create my own
TransportClientConnectionPool or it is sufficient to leave it on
TransportClient. Due to performance issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.