Java api TransportClient question


(Pavel Baranov) #1

Hello there,

Setup:
4 node cluster
1 index (tweets) (1.1 billion indexed, 5 shards)
using java api

java client:
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "tweets").build();

            Client client = new TransportClient(settings)
            .addTransportAddress(new InetSocketTransportAddress("#", #))
            .addTransportAddress(new InetSocketTransportAddress("#", #))
            .addTransportAddress(new InetSocketTransportAddress("#", #))
            .addTransportAddress(new InetSocketTransportAddress("#", 

#));

...
...
...
QueryBuilder qb = QueryBuilders.rangeQuery("tweet_date").gte(dt1).lte(dt2);
SearchRequestBuilder srb = client.prepareSearch("tweets");
srb.setTypes("tweet");
srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH);
SearchResponse sr =
srb.setQuery(qb).setSize(0).srb.execute().actionGet()

For some reason when I execute a "range date" query when results come back
only partial shards are "successful", eg:

"took" : 27119,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 224795578,
"max_score" : 0.0,
"hits" : [ ]
},

sometimes it's:

"took" : 31076,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 674358780,
"max_score" : 0.0,
"hits" : [ ]
},

is there a way to make sure all shards return all the information? Or am I
doing something wrong?

Thank you !

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

In the response, you can see that not all shards did respond. There is
something wrong with the shards. Regarding the very high response time, I
assume they got tight resources like memory or something.... at least they
dropped from the overall search results without timeout. Maybe there is
something in the server node logs.

Jörg

On Fri, Aug 8, 2014 at 9:20 PM, rookie7799 pavelbaranov@gmail.com wrote:

Hello there,

Setup:
4 node cluster
1 index (tweets) (1.1 billion indexed, 5 shards)
using java api

java client:
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "tweets").build();

            Client client = new TransportClient(settings)
            .addTransportAddress(new InetSocketTransportAddress("#",

#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#));

...
...
...
QueryBuilder qb = QueryBuilders.rangeQuery("tweet_date").gte(dt1).lte(dt2);
SearchRequestBuilder srb = client.prepareSearch("tweets");
srb.setTypes("tweet");
srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH);
SearchResponse sr =
srb.setQuery(qb).setSize(0).srb.execute().actionGet()

For some reason when I execute a "range date" query when results come back
only partial shards are "successful", eg:

"took" : 27119,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 224795578,
"max_score" : 0.0,
"hits" : [ ]
},

sometimes it's:

"took" : 31076,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 674358780,
"max_score" : 0.0,
"hits" : [ ]
},

is there a way to make sure all shards return all the information? Or am I
doing something wrong?

Thank you !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF-CrNsef0U9jpA8G%2BPrtm4yrXXkqJAwdS8OnwKT4iu7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Pavel Baranov) #3

You're right, it was a memory issue at the end. However, it's strange that
java api never complained about it.

Thank you for reply!

On Friday, August 8, 2014 4:41:04 PM UTC-4, Jörg Prante wrote:

In the response, you can see that not all shards did respond. There is
something wrong with the shards. Regarding the very high response time, I
assume they got tight resources like memory or something.... at least they
dropped from the overall search results without timeout. Maybe there is
something in the server node logs.

Jörg

On Fri, Aug 8, 2014 at 9:20 PM, rookie7799 <pavelb...@gmail.com
<javascript:>> wrote:

Hello there,

Setup:
4 node cluster
1 index (tweets) (1.1 billion indexed, 5 shards)
using java api

java client:
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "tweets").build();

            Client client = new TransportClient(settings)
            .addTransportAddress(new InetSocketTransportAddress("#", 

#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#));

...
...
...
QueryBuilder qb =
QueryBuilders.rangeQuery("tweet_date").gte(dt1).lte(dt2);
SearchRequestBuilder srb = client.prepareSearch("tweets");
srb.setTypes("tweet");
srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH);
SearchResponse sr =
srb.setQuery(qb).setSize(0).srb.execute().actionGet()

For some reason when I execute a "range date" query when results come
back only partial shards are "successful", eg:

"took" : 27119,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 224795578,
"max_score" : 0.0,
"hits" : [ ]
},

sometimes it's:

"took" : 31076,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 674358780,
"max_score" : 0.0,
"hits" : [ ]
},

is there a way to make sure all shards return all the information? Or am
I doing something wrong?

Thank you !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e9315667-cf40-4a3f-a814-d08922caf661%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

The reason is, Elasticsearch is designed to keep on delivering results
although parts of the cluster are failing. For this, it is important to
check the number of shards responding on app level. If they do not match
the total number of the index shards, the search must be considered
incomplete, and the cluster is operating in a degraded mode. In the case of
degradation, some apps might want to work on without notifying, while other
apps prefer to bail out. Elasticsearch does not enforce a decision and
delegates this to the app programmer.

Jörg

On Fri, Aug 8, 2014 at 10:58 PM, rookie7799 pavelbaranov@gmail.com wrote:

You're right, it was a memory issue at the end. However, it's strange that
java api never complained about it.

Thank you for reply!

On Friday, August 8, 2014 4:41:04 PM UTC-4, Jörg Prante wrote:

In the response, you can see that not all shards did respond. There is
something wrong with the shards. Regarding the very high response time, I
assume they got tight resources like memory or something.... at least they
dropped from the overall search results without timeout. Maybe there is
something in the server node logs.

Jörg

On Fri, Aug 8, 2014 at 9:20 PM, rookie7799 pavelb...@gmail.com wrote:

Hello there,

Setup:
4 node cluster
1 index (tweets) (1.1 billion indexed, 5 shards)
using java api

java client:
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "tweets").build();

            Client client = new TransportClient(settings)
            .addTransportAddress(new InetSocketTransportAddress("#",

#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#))
.addTransportAddress(new InetSocketTransportAddress("#",
#));

...
...
...
QueryBuilder qb = QueryBuilders.rangeQuery("
tweet_date").gte(dt1).lte(dt2);
SearchRequestBuilder srb = client.prepareSearch("tweets");
srb.setTypes("tweet");
srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH);
SearchResponse sr =
srb.setQuery(qb).setSize(0).srb.execute().actionGet()

For some reason when I execute a "range date" query when results come
back only partial shards are "successful", eg:

"took" : 27119,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 224795578,
"max_score" : 0.0,
"hits" : [ ]
},

sometimes it's:

"took" : 31076,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 674358780,
"max_score" : 0.0,
"hits" : [ ]
},

is there a way to make sure all shards return all the information? Or am
I doing something wrong?

Thank you !

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/651cf463-fbd1-4fa7-ac74-b9ca1520fcb5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e9315667-cf40-4a3f-a814-d08922caf661%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e9315667-cf40-4a3f-a814-d08922caf661%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGwC%2BDcmUFGVQD96WkA7FXzw9C5uwxHL5mJb2GycYTrDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(AsyncAwait) #5

Just to give my two cents.

Always check for wait_for_clusterstate before applying queries. Not sure about Java API. Have seen in past it works consistently.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2744a85-679d-4092-8360-81c355d2be56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6