Update document while scrolling


(Arinto Murdopo) #1

Hi all,

I plan to update my data while performing scrolling. Is this use case valid
for ElasticSearch? and is there any best practice for this use case?

What I plan to do is something like this (using Java API). While iterating
through the hits of ScrollResponse, I issue ElasticSearch update request.

public void updateWhileScrolling(){

SearchResponse scrollResp = esRelationshipClient

.prepareSearch(someIndex)

.setTypes(someType)

.setSearchType(SearchType.SCAN)

.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))

.setQuery(QueryBuilders.matchAllQuery())

.setSize(sizePerShard)

.execute().actionGet();

while(true){

scrollResp = esRelationshipClient

.prepareSearchScroll(scrollResp.getScrollId())

.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))

.execute().actionGet();

for(SearchHit hit: scrollResp.getHits()){

Map<String, Object> source = hit.getSource();

if(source != null){

      *//update source using Update API*

}else{

logger.error("source is null for {}", hit.toString());

}

} //end for loop for processing each hit

//break condition, no hits are returned

if (scrollResp.getHits().getHits().length == 0) {

    break;

}

}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd91574a-f8aa-4aef-959a-1ea10e7b3a3c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I think it makes sense to do it like this.
The only comment I have is that you should use BulkProcessor to send your new documents.
Not sure I will use Update API because basically you already have full _source in response hits.
So, updating on a client level could make sense.

If you don't want to send again all docs over the wire, then Update API is OK. In that case I'd probably disable _source in prepareSearch using fields.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 10 février 2014 at 09:23:53, Arinto Murdopo (arinto@gmail.com) a écrit:

Hi all,

I plan to update my data while performing scrolling. Is this use case valid for ElasticSearch? and is there any best practice for this use case?

What I plan to do is something like this (using Java API). While iterating through the hits of ScrollResponse, I issue ElasticSearch update request.

public void updateWhileScrolling(){
SearchResponse scrollResp = esRelationshipClient
.prepareSearch(someIndex)
.setTypes(someType)
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))
.setQuery(QueryBuilders.matchAllQuery())
.setSize(sizePerShard)
.execute().actionGet();

while(true){
scrollResp = esRelationshipClient

.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))
.execute().actionGet();

for(SearchHit hit: scrollResp.getHits()){
Map<String, Object> source = hit.getSource();
if(source != null){
//update source using Update API
}else{
logger.error("source is null for {}", hit.toString());
}

} //end for loop for processing each hit

//break condition, no hits are returned
if (scrollResp.getHits().getHits().length == 0) {
break;
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd91574a-f8aa-4aef-959a-1ea10e7b3a3c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52f88ff5.109cf92e.15ce8%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Arinto Murdopo) #3

Merci beaucoup David!

Noted :slight_smile:

Best regards,

Arinto

On Monday, February 10, 2014 4:38:13 PM UTC+8, David Pilato wrote:

I think it makes sense to do it like this.
The only comment I have is that you should use BulkProcessor to send your
new documents.
Not sure I will use Update API because basically you already have full
_source in response hits.
So, updating on a client level could make sense.

If you don't want to send again all docs over the wire, then Update API is
OK. In that case I'd probably disable _source in prepareSearch using fields.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 10 février 2014 at 09:23:53, Arinto Murdopo (ari...@gmail.com<javascript:>)
a écrit:

Hi all,

I plan to update my data while performing scrolling. Is this use case
valid for ElasticSearch? and is there any best practice for this use case?

What I plan to do is something like this (using Java API). While iterating
through the hits of ScrollResponse, I issue ElasticSearch update request.

public void updateWhileScrolling(){

SearchResponse scrollResp = esRelationshipClient

.prepareSearch(someIndex)

.setTypes(someType)

.setSearchType(SearchType.SCAN)

.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))

.setQuery(QueryBuilders.matchAllQuery())

.setSize(sizePerShard)

.execute().actionGet();

while(true){

scrollResp = esRelationshipClient

.prepareSearchScroll(scrollResp.getScrollId())

.setScroll(new TimeValue(scrollTimeWindow, scrollTimeUnit))

.execute().actionGet();

for(SearchHit hit: scrollResp.getHits()){

Map<String, Object> source = hit.getSource();

if(source != null){

       *//update source using Update API*

}else{

logger.error("source is null for {}", hit.toString());

} 

} //end for loop for processing each hit

//break condition, no hits are returned

 if (scrollResp.getHits().getHits().length == 0) {

     break;

 }

}

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd91574a-f8aa-4aef-959a-1ea10e7b3a3c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25874b20-ab26-4b75-a260-487169b55c5f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4