ES 1.3.4 scrolling never ends

Hi all,

I'm encountering a strange behavior when executing a search-scroll on a
single node of ES-1.3.4 with Java client.

The scenario is as follows:

  1. Start a single node of version 1.3.4

  2. Add snapshot repository pointing to version 1.1.1 snapshots

  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node

  4. Execute search on an index with

  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.filteredQuery
    (QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
    QueryBuilders.queryString(s"$terms AND
    snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") )) )
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(pageSize
    ).addSort(OBFields.updateNo.toString, SortOrder.ASC)
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()

  6. Execute the following search scroll
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.timeValueMinutes
    (3)).execute().actionGet()

I have a loop iterating over #6, providing the same scrollId and checking
for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.

Any Idea??

Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be0f385c-9d46-492b-a818-9bb04c92b214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You need to get the scroll ID from each response and use that one in the
subsequent scan search. You cannot simply reuse the same scroll ID.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1f23ca4-13e6-4d1e-ad01-2cbda2810c94%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'll try that and report....

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:

Hi all,

I'm encountering a strange behavior when executing a search-scroll on a
single node of ES-1.3.4 with Java client.

The scenario is as follows:

  1. Start a single node of version 1.3.4

  2. Add snapshot repository pointing to version 1.1.1 snapshots

  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node

  4. Execute search on an index with

  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.
    filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter
    (
    QueryBuilders.queryString(s"$terms AND
    snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") )) )
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(
    pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()

  6. Execute the following search scroll
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.
    timeValueMinutes(3)).execute().actionGet()

I have a loop iterating over #6, providing the same scrollId and checking
for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.

Any Idea??

Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c658566c-f2e4-4020-bd14-08e413c81a9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Update:
Only when I set the SearchType to something else than
the QUERY_AND_FETCH the scroll success to finish.

Any idea why QUERY_THEN_FETCH(the default) brings me to an endless loop?

The full code is:

val client = ESClientFactory.createByNode(ESNode.Builder,cluster = "test_acm_es")

val query = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter(
  QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND feed:32)) AND snapshotNo:[12614 TO 12627]")))

var result: SearchResponse = client.prepareSearch("orderbook-2014.10.21")
  .setQuery(query)
  .addFields(OBFields.values.map(_.toString).toList: _*)
  .setSearchType(SearchType.QUERY_THEN_FETCH)
  .setSize(1000)
  .addSort(OBFields.updateNo.toString, SortOrder.ASC)
  .setScroll(new Scroll(TimeValue.timeValueMinutes(5)))
.execute().actionGet()


println(s"Result total hits:${result.getHits.totalHits()}")
println(s"Result hits:${result.getHits.getHits().length}")
  do {

// result = new SearchScrollRequestBuilder(client,result.getScrollId).setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet()
result = client.prepareSearchScroll(result.getScrollId).setScroll(TimeValue.timeValueMinutes(2)).execute().actionGet()
println(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
itr += 1

  } while (result.getHits.getHits.length != 0)

Thanks for any idea...
Yarden

On Wednesday, November 5, 2014 5:52:25 PM UTC+2, Yarden Bar wrote:

I'll try that and report....

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:

Hi all,

I'm encountering a strange behavior when executing a search-scroll on a
single node of ES-1.3.4 with Java client.

The scenario is as follows:

  1. Start a single node of version 1.3.4

  2. Add snapshot repository pointing to version 1.1.1 snapshots

  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node

  4. Execute search on an index with

  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.
    filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.
    queryFilter(
    QueryBuilders.queryString(s"$terms AND
    snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") )) )
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(
    pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet
    ()

  6. Execute the following search scroll
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.
    timeValueMinutes(3)).execute().actionGet()

I have a loop iterating over #6, providing the same scrollId and checking
for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.

Any Idea??

Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f1a1178-d4e5-41e9-a464-68c6e9204779%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

One issue I identified is the heap size was too small for the query, I've
increased the heap memory and the CircuitBreakerException stopped happening.

But the scrolling still returning the SAME result.

An updated code example is below:
import org.elasticsearch.action.search.SearchType
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.common.transport.InetSocketTransportAddress
import org.elasticsearch.common.unit.TimeValue
import org.elasticsearch.index.query.{FilterBuilders, QueryBuilders}
import org.elasticsearch.search.Scroll
import org.elasticsearch.search.sort.SortOrder

val es_settings = ImmutableSettings.settingsBuilder().put("transport.sniff",
true).put("cluster.name", "test_acm_es").build()
var client = new TransportClient(es_settings).addTransportAddress(new
InetSocketTransportAddress("myServer",9300))
val query = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.queryFilter(
QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND
feed:32))")))
var result = client.prepareSearch("orderbook-2014.11.03").setTypes(List(
"level"):_*).setQuery(query).setSearchType(SearchType.DFS_QUERY_THEN_FETCH).
setSize(10000).addSort("updateNo", SortOrder.ASC).setScroll(new Scroll(
TimeValue.timeValueMinutes(5))).get()
var scrollId = ""
var itr = 0
do {
scrollId = result.getScrollId
result = client.prepareSearchScroll(scrollId).setScroll(TimeValue.
timeValueMinutes(3)).get()
println(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
// println("------------------------------------")
// result.getHits.getHits.foreach(h => println(h.getId))
// println("------------------------------------")
itr+=1
} while (result.getHits.getHits.length != 0)

enabling the print block reveals that the searchHit array is the same for
each iteration...

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:

Hi all,

I'm encountering a strange behavior when executing a search-scroll on a
single node of ES-1.3.4 with Java client.

The scenario is as follows:

  1. Start a single node of version 1.3.4

  2. Add snapshot repository pointing to version 1.1.1 snapshots

  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node

  4. Execute search on an index with

  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.
    filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.queryFilter
    (
    QueryBuilders.queryString(s"$terms AND
    snapshotNo:[${mdp.fromSnapshot} TO ${mdp.toSnapshot}]") )) )
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(
    pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet()

  6. Execute the following search scroll
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.
    timeValueMinutes(3)).execute().actionGet()

I have a loop iterating over #6, providing the same scrollId and checking
for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.

Any Idea??

Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66e02775-17dd-4ea0-a8b3-39eb7e2a7aca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You must initiate scan/scroll with search type SCAN. The scan/scroll
pattern is like this

SearchRequest searchRequest = new
SearchRequestBuilder(client).setQuery(QueryBuilders.matchAllQuery()).request();
searchRequest.searchType(SearchType.SCAN).scroll(request.getTimeout());
SearchResponse searchResponse = client.search(searchRequest).actionGet();
// get total hits here before entering the loop
while (searchResponse.getScrollId() != null) {
searchResponse =
client.prepareSearchScroll(searchResponse.getScrollId())

.setScroll(request.getTimeout()).execute().actionGet();
long hits = searchResponse.getHits().getHits().length;
// process hits of a scroll here

}

Jörg

On Mon, Nov 10, 2014 at 1:27 PM, Yarden Bar ayash.jorden@gmail.com wrote:

One issue I identified is the heap size was too small for the query, I've
increased the heap memory and the CircuitBreakerException stopped happening.

But the scrolling still returning the SAME result.

An updated code example is below:
import org.elasticsearch.action.search.SearchType
import org.elasticsearch.client.transport.TransportClient
import org.elasticsearch.common.settings.ImmutableSettings
import org.elasticsearch.common.transport.InetSocketTransportAddress
import org.elasticsearch.common.unit.TimeValue
import org.elasticsearch.index.query.{FilterBuilders, QueryBuilders}
import org.elasticsearch.search.Scroll
import org.elasticsearch.search.sort.SortOrder

val es_settings = ImmutableSettings.settingsBuilder().put(
"transport.sniff", true).put("cluster.name", "test_acm_es").build()
var client = new TransportClient(es_settings).addTransportAddress(new
InetSocketTransportAddress("myServer",9300))
val query = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.queryFilter(
QueryBuilders.queryString("((market:2 AND feed:55) OR (market:2 AND
feed:32))")))
var result = client.prepareSearch("orderbook-2014.11.03").setTypes(List(
"level"):_*).setQuery(query).setSearchType(SearchType.DFS_QUERY_THEN_FETCH
).setSize(10000).addSort("updateNo", SortOrder.ASC).setScroll(new Scroll(
TimeValue.timeValueMinutes(5))).get()
var scrollId = ""
var itr = 0
do {
scrollId = result.getScrollId
result = client.prepareSearchScroll(scrollId).setScroll(TimeValue.
timeValueMinutes(3)).get()
println(s"Iteration=$itr, scrollResult=${result.getHits.getHits.length}")
// println("------------------------------------")
// result.getHits.getHits.foreach(h => println(h.getId))
// println("------------------------------------")
itr+=1
} while (result.getHits.getHits.length != 0)

enabling the print block reveals that the searchHit array is the same for
each iteration...

Thanks,
Yarden

On Wednesday, November 5, 2014 2:48:46 PM UTC+2, Yarden Bar wrote:

Hi all,

I'm encountering a strange behavior when executing a search-scroll on a
single node of ES-1.3.4 with Java client.

The scenario is as follows:

  1. Start a single node of version 1.3.4

  2. Add snapshot repository pointing to version 1.1.1 snapshots

  3. Restore snapshots version 1.1.1 snapshot to 1.3.4 node

  4. Execute search on an index with

  5. client.prepareSearch("my_index*").setQuery(QueryBuilders.
    filteredQuery(QueryBuilders.matchAllQuery(), FilterBuilders.
    queryFilter(
    QueryBuilders.queryString(s"$terms AND snapshotNo:[${mdp.fromSnapshot}
    TO ${mdp.toSnapshot}]") )) )
    .addFields(OBFields.values.map(_.toString).toList: _*).setSize(
    pageSize).addSort(OBFields.updateNo.toString, SortOrder.ASC)
    .setScroll(TimeValue.timeValueMinutes(3)).execute().actionGet
    ()

  6. Execute the following search scroll
    client.prepareSearchScroll(scrollId).setScroll(TimeValue.tim
    eValueMinutes(3)).execute().actionGet()

I have a loop iterating over #6, providing the same scrollId and checking
for (result.getHits().getHits().legth == 0) to terminate.
I keep getting the same result 'page' with the same amount of results.

Any Idea??

Thanks,
Yarden

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66e02775-17dd-4ea0-a8b3-39eb7e2a7aca%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/66e02775-17dd-4ea0-a8b3-39eb7e2a7aca%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGgEhDf210fHVx%2BNqj-qFc5xu32zTp9FkK3W1Dtpi%3DJgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jorg,

I cant use scan type because I need the documents sorted ASC on a field, scan returns the documents in the order they indexed.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1eb0d4dd-1659-48a2-929b-194ebd531465%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scan is not really the order the docs are indexed (it depends on how the
index segments in the shards return the docs).

But anyway, you can not scroll over a sorted result set.

Jörg

On Mon, Nov 10, 2014 at 3:12 PM, Yarden Bar ayash.jorden@gmail.com wrote:

Hi Jorg,

I cant use scan type because I need the documents sorted ASC on a field,
scan returns the documents in the order they indexed.

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1eb0d4dd-1659-48a2-929b-194ebd531465%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEfbojPZCOesok%2B5jdkRu6Z5CExTDzAVnwTCZXveL7dHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

A while back, I wrote my own post-query response sorting so that I could
handle cases that Elasticsearch didn't. One case was sorting a scan query.
I used a Java TreeSet class and could also limit it to the top 'N'
(configurable) items. It is very, very quick, pretty much adding no
overhead to the existing scan logic. And it supports an arbitrarily complex
compound sort key, much like an SQL ORDERBY statement; it's very easy to
construct.

Probably not useful for a normal user query, but it is very useful for an
ad-hoc query in which I wish to scan across an indeterminately large result
set but still sort the results.

One of these days, it might make a good plug-in candidate. But I am not
sure how to integrate it with the scan API, so for now it's just part of
the Java client layer.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74e311f5-ae54-4da1-9369-567e7bf03272%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Finally the issue was solved.

I forgot to mention that I had a Logstash output connected and it's protocol
http://logstash.net/docs/1.4.2/outputs/elasticsearch#protocol was set to
'node', meaning that logstash was part of my cluster.
Once I set the protocol to 'transport',scrolling was perfect!!

Credit to my team-leader for guidance....

Thanks everyone for your help
Yarden

On Monday, November 10, 2014 7:06:52 PM UTC+2, Brian wrote:

A while back, I wrote my own post-query response sorting so that I could
handle cases that Elasticsearch didn't. One case was sorting a scan query.
I used a Java TreeSet class and could also limit it to the top 'N'
(configurable) items. It is very, very quick, pretty much adding no
overhead to the existing scan logic. And it supports an arbitrarily
complex compound sort key, much like an SQL ORDERBY statement; it's very
easy to construct.

Probably not useful for a normal user query, but it is very useful for an
ad-hoc query in which I wish to scan across an indeterminately large result
set but still sort the results.

One of these days, it might make a good plug-in candidate. But I am not
sure how to integrate it with the scan API, so for now it's just part of
the Java client layer.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44b5c8db-6c09-4527-b440-09d01bde3588%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.