Rails: Tire import taking too long


(Karan Verma) #1

Hi

I want to index close to 900,000 documents but it is taking quite long. I
have defined a mapping which contains indexes is on the lines of:

indexes :residencies_with_year,     type: 'string',     :as => 

'residencies_obj.map{|r| ExpertProfile.residency_with_year_to_s(r)}'

Thus I believe there are quite a few database queries as I'm indexing the
model. I'm using the default configuration number of nodes =1, number of
shards=5, and number of replicas=1.
Each batch of 1000 documents is taking about 15 minutes.

How can I speed it up?

rake environment tire:import CLASS='Expert' FORCE=true

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Karel Minarik) #2

Hi,

I've got couple of questions:

1/ What kind of database are you reading the data from (SQLite, PostgreSQL,
etc)? Locally or in production?

2/ Assuming you're using ActiveRecord, can you enable logging and have a
look if you're not hit with n+1 queries? Just put something like this in
your Rakefile or the Tire initializer?

ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.clear_active_connections!

Indexing a 1,000 documents should take seconds at most, usually.

Karel

On Thursday, October 10, 2013 3:37:12 AM UTC+2, Karan Verma wrote:

Hi

I want to index close to 900,000 documents but it is taking quite long. I
have defined a mapping which contains indexes is on the lines of:

indexes :residencies_with_year,     type: 'string',     :as => 

'residencies_obj.map{|r| ExpertProfile.residency_with_year_to_s(r)}'

Thus I believe there are quite a few database queries as I'm indexing the
model. I'm using the default configuration number of nodes =1, number of
shards=5, and number of replicas=1.
Each batch of 1000 documents is taking about 15 minutes.

How can I speed it up?

rake environment tire:import CLASS='Expert' FORCE=true

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Karan Verma) #3

Hi Karel

  1. Data is being read from MySQL and DynamoDB and both are remote
    environments.

  2. Initially i was creating the index on ES server running on localhost. I
    setup one on a separate EC2 instance. I enabled logging and the time it
    takes to load one Expert from the model is about 74ms. Thus for 1000
    documents it should take 1.5 minutes but its taking 3 minutes which is
    still better. I checked for the n+1 issue and there are some optimizations
    possible there, but I guess 74ms for one record is not that bad?

However for 900,000 this still seems it'll take quite some time. Is there a
way to index in parallel?

How about:

rake environment tire:import CLASS="Expert.where('id % 10 = 9')"

without the Force = true option, which I guess, wouldn't cause the index to
be deleted and I could run 10 rake tasks in parallel. Would this cause any
problems? Is there a better way?

On Thu, Oct 10, 2013 at 2:02 AM, Karel Minařík karel.minarik@gmail.comwrote:

Hi,

I've got couple of questions:

1/ What kind of database are you reading the data from (SQLite,
PostgreSQL, etc)? Locally or in production?

2/ Assuming you're using ActiveRecord, can you enable logging and have a
look if you're not hit with n+1 queries? Just put something like this in
your Rakefile or the Tire initializer?

ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.clear_active_connections!

Indexing a 1,000 documents should take seconds at most, usually.

Karel

On Thursday, October 10, 2013 3:37:12 AM UTC+2, Karan Verma wrote:

Hi

I want to index close to 900,000 documents but it is taking quite long. I
have defined a mapping which contains indexes is on the lines of:

indexes :residencies_with_year,     type: 'string',     :as =>

'residencies_obj.map{|r| ExpertProfile.residency_with_**year_to_s(r)}'

Thus I believe there are quite a few database queries as I'm indexing the
model. I'm using the default configuration number of nodes =1, number of
shards=5, and number of replicas=1.
Each batch of 1000 documents is taking about 15 minutes.

How can I speed it up?

rake environment tire:import CLASS='Expert' FORCE=true

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/pvEl7nwK5cg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Best,
Karan

Life saving Ninja & Software Engineer

Karan pronounced Ka (http://tiny.cc/0lu61w) + Run

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Karan Verma) #4

Update:

The above method doesn't seem to work. Including the trace.


1000/98754 | 1% rake aborted!

The level of configured provisioned throughput for the table was exceeded.
Consider increasing your provisioning level with the UpdateTable API
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:184:in
instance_eval' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/aws-sdk-1.5.8/lib/aws/core/client.rb:376:inclient_request'
(eval):3:in get_item' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/aws-sdk-1.5.8/lib/aws/dynamo_db/attribute_collection.rb:437:inget_item'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/aws-sdk-1.5.8/lib/aws/dynamo_db/attribute_collection.rb:426:in
to_h' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/activesupport-3.1.12/lib/active_support/core_ext/object/try.rb:32:intry'
/Users/karan/ht-webapp12/app/models/expert/nosql.rb:141:in
get_expert_profile_nosql' /Users/karan/ht-webapp12/app/models/expert/nosql.rb:71:inspecialties'
/Users/karan/ht-webapp12/app/models/expert.rb:3360:in specialty_names' /Users/karan/ht-webapp12/app/models/expert.rb:686:inrelevancy_tags'
(eval):1:in block in to_indexed_json' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:184:ininstance_eval'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:184:in
block in to_indexed_json' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:181:ineach'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:181:in
to_indexed_json' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/model/search.rb:312:into_indexed_json'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/index.rb:432:in
convert_document_to_json' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/index.rb:138:inblock in bulk'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/activerecord-3.1.12/lib/active_record/relation.rb:15:in
map' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/activerecord-3.1.12/lib/active_record/relation.rb:15:inmap'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/index.rb:114:in
bulk' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/index.rb:178:inbulk_store'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/index.rb:194:in
import' /Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/tasks.rb:86:inblock (3 levels) in <top (required)>'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/gems/tire-0.5.3/lib/tire/tasks.rb:72:in
block (2 levels) in <top (required)>' /Users/karan/.rvm/gems/ruby-1.9.3-p448/bin/ruby_executable_hooks:15:ineval'
/Users/karan/.rvm/gems/ruby-1.9.3-p448/bin/ruby_executable_hooks:15:in
`'
Tasks: TOP => tire:import
(See full trace by running task with --trace)

On Thu, Oct 10, 2013 at 12:49 PM, Karan Verma karan@healthtap.com wrote:

Hi Karel

  1. Data is being read from MySQL and DynamoDB and both are remote
    environments.

  2. Initially i was creating the index on ES server running on localhost. I
    setup one on a separate EC2 instance. I enabled logging and the time it
    takes to load one Expert from the model is about 74ms. Thus for 1000
    documents it should take 1.5 minutes but its taking 3 minutes which is
    still better. I checked for the n+1 issue and there are some optimizations
    possible there, but I guess 74ms for one record is not that bad?

However for 900,000 this still seems it'll take quite some time. Is there
a way to index in parallel?

How about:

rake environment tire:import CLASS="Expert.where('id % 10 = 9')"

without the Force = true option, which I guess, wouldn't cause the index
to be deleted and I could run 10 rake tasks in parallel. Would this cause
any problems? Is there a better way?

On Thu, Oct 10, 2013 at 2:02 AM, Karel Minařík karel.minarik@gmail.comwrote:

Hi,

I've got couple of questions:

1/ What kind of database are you reading the data from (SQLite,
PostgreSQL, etc)? Locally or in production?

2/ Assuming you're using ActiveRecord, can you enable logging and have a
look if you're not hit with n+1 queries? Just put something like this in
your Rakefile or the Tire initializer?

ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.clear_active_connections!

Indexing a 1,000 documents should take seconds at most, usually.

Karel

On Thursday, October 10, 2013 3:37:12 AM UTC+2, Karan Verma wrote:

Hi

I want to index close to 900,000 documents but it is taking quite long.
I have defined a mapping which contains indexes is on the lines of:

indexes :residencies_with_year,     type: 'string',     :as =>

'residencies_obj.map{|r| ExpertProfile.residency_with_**year_to_s(r)}'

Thus I believe there are quite a few database queries as I'm indexing
the model. I'm using the default configuration number of nodes =1, number
of shards=5, and number of replicas=1.
Each batch of 1000 documents is taking about 15 minutes.

How can I speed it up?

rake environment tire:import CLASS='Expert' FORCE=true

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/pvEl7nwK5cg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Best,
Karan

Life saving Ninja & Software Engineer

Karan pronounced Ka (http://tiny.cc/0lu61w) + Run

--
Best,
Karan

Life saving Ninja & Software Engineer

Karan pronounced Ka (http://tiny.cc/0lu61w) + Run

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5