Copy docs from one index to another index


(ganeshbabu) #1

Hi Team,

I have two indexes(lets say es_item, es_item1) with same mapping and same index type. es_item index contains 1000 docs and es_item1 index contains 2000 docs.

I want to copy or move docs from es_item to es_item1 index.
Is it possible to do that?

Thanks,
Ganeshbabu R


(Nik Everett) #2

Right now there isn't anything built especially for that. You'd have to
scroll from one index and bulk into the other index.


(ganeshbabu) #3

Thanks for your response @nik9000

I tried to use scan and scroll for the reindex and I executed the below command to get scroll id

GET /es_item/_search?search_type=scan&scroll=1m
{
"query": { "match_all": {}},
"size": 1000
}

I got the below response

{
"_scroll_id": "c2Nhbjs1Ozc3MzE6R3FRUUxURGpUUE9rdjQxaGZoUWJBQTs3NzQwOlJoUWhVNWxIVFpXOV8xTmpLUGR6NUE7NzczMjpHcVFRTFREalRQT2ZmhRYkFBOzc3NDE6UmhRaFU1bEhUWlc5XzFOaktQZHo1QTs3NzMzOkdxUVFMVERqVFBPa3Y0MWhmaFFiQUE7MTt0b3RhbF9oaXRzOjEyMTY1Njs=",
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 121656,
"max_score": 0,
"hits": []
}
}

I am new to the Elasticsearch scan & scroll. Can you tell me how to do bulk index by using the scroll id?

Please let me know any sample code.

Thanks,
Ganeshbabu R


(David Pilato) #4

I also wrote a blog a post about this subject: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/


(Nik Everett) #5

I was going to suggest reading the perl client because I know it has a reindex. The logstash is likely to be less complicated though.


(ganeshbabu) #6

Thanks for your response @dadoonet


(ganeshbabu) #7

Thanks for your response @nik9000


(ganeshbabu) #8

Hi David,

I tried to copy docs from one index to another index by using logstash

Below is the logstash.conf file I created.

input {

We read from the "old" index

elasticsearch {
hosts => [ "10.7.148.21" ]
port => "9200"
index => "es_item"
user => "esadmin"
password => "password"
size => 500
scroll => "5m"
docinfo => true
}
}

output {

We write to the "new" index

elasticsearch {
host => "10.7.148.21"
port => "9200"
protocol => "http"
user => "esadmin"
password => "password"
index => "es_item1"
index_type => "item"
document_id => "16636281"
}

We print dots to see it in action

stdout {
codec => "dots"
}
}

When I executed the below command
bin/logstash -f logstash.conf

I got the following response:-

You are using a deprecated config setting "index_type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"index_type", :plugin=><LogStash::Outputs::ElasticSearch --->, :level=>:warn}

Logstash startup completed

Exception in thread "Ruby-0-Thread-7: /opt/esadmin/elasticsearch/bin/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:92" java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)

Is this right way of doing the logstash config?

Please guide me to fix this issue

Thanks,
Ganeshbabu R


(David Pilato) #9

First guess: add a empty filter section.

If it does not work can you open a new thread in logstash group? And if it works, I would open an issue in logstash project.


(ganeshbabu) #10

David, @dadoonet

I tried by giving empty filter section but still I am facing the same error.
Can you provide me logstash group link? so that I can open new thread.

I am using elasticsearch 1.7.2 but I downloaded logstash 1.5.0 will it be the reason for getting this exception?

Note:- I see the docs count increase in the target index but due to the below exception entire docs are not copied to target index

Exception in thread "Ruby-0-Thread-7: /opt/esadmin/elasticsearch/bin/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/stud-0.0.19/lib/stud/buffer.rb:92" java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "<elasticsearch" java.lang.UnsupportedOperationException
at java.lang.Thread.stop(Thread.java:869)
at org.jruby.RubyThread.exceptionRaised(RubyThread.java:1221)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:112)
at java.lang.Thread.run(Thread.java:745)
Logstash shutdown completed

Please let me know suggestion It would be very helpful

Thanks,
Ganeshbabu R


(David Pilato) #11

(ganeshbabu) #12

Thanks for your quick response @dadoonet


(Lee Drengenberg) #13

I'm an es newbie, but I used https://github.com/taskrabbit/elasticsearch-dump to export some test data and re-import it. It might be useful to you.


(ganeshbabu) #14

Thanks for your response @LeeDr Sure I will do some testing in DEV environment and let you know the feedback.


(ganeshbabu) #15

Hi @dadoonet,

I upgraded to ES version from 1.7.2 to 1.7.3 & installed logstash 1.5.5 and trying to copy docs from one index to another index.

Below is the sample** logstash.conf** file

input {
elasticsearch {
hosts => ["10.7.148.21:9200"]
user => "esadmin"
password => "password"
index => "item_demo"
size => 500
scroll => "5m"
docinfo=> true
}
}

filter {

}

output {
elasticsearch {
host => "10.7.148.21:9200"
protocol => "http"
user => "esadmin"
password => "password"
index => "item_test"
}
stdout {
codec => "dots"
}
}
When I executed the command bin/logstash -f logstash.conf

I got the following error response:-

failed action with response of 404, dropping action: ["index", {:id=>nil, :index=>"item_demo_test", :type=>"logs", :routing=>nil}, #@metadata_accessors=#@store={"index"=>"itemdemo", "type"=>"item", "id"=>"9151301", "retry_count"=>0}, @lut={}>, @cancelled=false, @data={"ITEM_ID"=>9151301, "ITEM_CODE"=>"9151301", "ITEM_DSCR"=>"KITC HEN & TABLEWARE", "ITEM_SPECIFICITY_REF_ID"=>186, "ITEM_TYPE"=>"SGI", "IS_SHARED_IND"=>"Y", "HAS_IMAGE_IND"=>"Y", "HAS_HIST_IND"=>"Y", "MOD_CHR_VAL_ID"=>12574160, "MOD_DSCR"=>"KITCHEN & TABLEWARE", "SG_CHR_VAL_ID"=>5439993, "PG_CHR_VAL_ID"=>17428187, "ITEM_EU_NAN_CODE "=>"8378994558", "ITEM_EU_NAN_KEY"=>"4422653", "ITEM_CA_CNAN"=>nil, "ITEM_US_PRDC_ID"=>nil, "RELATIONSHIP"=>[{"REL_TYP_REF_ID"=>-999, "REL_TYPE"=>"NO RELATIONSHIP", "REL_CTGRY"=>"NIL"}], "ITEM_MISUSED_GTIN_FLG"=>"N", "CRT_DTTM"=>"2004-09-25 22:00:00", "UPD_DTTM"=>"201 4-06-01 04:01:30", "DIST"=>[{"RGN_ID"=>4, "RGN_NM"=>"DE", "DSTN_STRT_DT"=>"2007-08-22 20:24:28", "DSTN_END_DT"=>"9999-12-31 00:00:00", "ITEM_GLBL_CODE_ST_REF_ID"=>720, "ITEM_LCL_CODE_ST_REF_ID"=>720, "IS_FFU_IND"=>"Y", "FIRST_FFU_DT"=>"2012-07-06 16:17:18", "HAS_LPV_I ND"=>"N", "FIRST_DATA_DT"=>nil, "LAST_DATA_DT"=>nil, "HAS_HIST_IND"=>"Y", "LAST_CHG_DT"=>"2004-09-26 10:02:29", "IS_PREMVMNT_IND"=>"N" , "PRE_MOVEMENT_DT"=>nil}

Logstash 1.5.5 is worked well for ES version 1.7.2.

Please guide me to resolve this issue it would be very helpful.

Thanks,
Ganeshbabu R


(David Pilato) #16

I'd open another thread in logstash forum.
But apparently you have a 404 error which means that your index does not exist.

Check GET 10.7.148.21:9200/item_demo_test


(ganeshbabu) #17

Thanks for your response @dadoonet

I am getting the response using curl GET command

[esadmin@dayrhezhmd001 elasticsearch-1.7.3]$ curl --user esadmin:password -XGET 'http://10.7.148.21:9200/item_demo_test/?pretty=true'

Sample:-
{
"item_demo_test" : {
"aliases" : { },
"mappings" : {
"item" : {
"_all" : {
"enabled" : false
},
"properties" : {
"CRT_DTTM" : {
"type" : "date",
"doc_values" : true,
"format" : "yyyy-MM-dd HH:mm:ss"
},
"DIST" : {
"properties" : {
"DSTN_END_DT" : {
"type" : "date",
"doc_values" : true,
"format" : "yyyy-MM-dd HH:mm:ss"
},
"DSTN_STRT_DT" : {
"type" : "date",
"doc_values" : true,
"format" : "yyyy-MM-dd HH:mm:ss"
},
}
}
}
}

Let me know your feedback.

Thanks,
Ganeshbabu R


(David Pilato) #18

I have no idea. I'd open this discussion in logstash group.


(ganeshbabu) #19

Hi @dadoonet

I tried copying docs from old index to new index. I executed the logstash run command using nohup and running the logstash in background make docs count keep on increasing.

Below is the logstash conf file,

input {

We read from the "old" index

elasticsearch {
hosts => ["10.8.42.121:9200"]
user => "esadmin"
password => "Gemmot08"
index => "es_item"
size => 1000
scroll => "1m"
docinfo=> true
}
}

filter {

}

output {

We write to the "new" index

elasticsearch {
host => "10.8.42.121:9200"
protocol => "http"
user => "esadmin"
password => "xxxxx"
index => "es_item2"
document_type => "item"
}

We print dots to see it in action

stdout {
codec => "dots"
}
}

Logstash run command
nohup ./bin/logstash -f logstash.conf &

Logstash never stops when running in background process and keeps on adding docs to new index and it causing large index size.

How to terminate the logstash once docs copy is completed?

Is there any other way to run logstash in background?

Thanks,
Ganeshbabu R


(David Pilato) #20

Kill -9 ?

Why running with nohup and &?