Fast populate index using elastic search


(jimmy) #1

I am a beginner in symfony2 elastic search integration.

Currently my set up is simple with 5 shards and 1 node ( just one simple ubuntu node)
with symfony and elastic search installed and running.
How can I make indexing faster .I have millions of pictures and I am
using "caption" type to index it.
I run " php app/console fos:elastica:populate --no-reset
--index=search --type=picture --no-debug --batch-size=1000"
and it is taking too much time.(Around 2 hours)

I am also using bulk apis and I want to index million items but the
document suggests using "doc" , "source" ,"id" etc.

  1. I am using perl code
    use Search::Elasticsearch;

my $es = Search::Elasticsearch-new;
my $bulk = $es->bulk_helper(
index = 'search',
type = 'picture'
);

Index docs:

$bulk->index();
Helpful link:
https://metacpan.org/pod/Search::Elasticsearch::Bulk#flush

  1. Also Using curl request I use

curl -s -XPOST localhost:9200/search/picture/_bulk --data-binary @requests; echo

where reuests contain
{ "index" : { "_index" : "search", "_type" : "picture" } }
{ "mappings" : "caption" }

How could I populate all the indexes very quickly?
Can somebody please help
The above two methods dont work . I am sure I might be making mistakes using perl bulk library or using _bulk curl requests
Please advise

Thanks


(Mark Walkom) #2

Try setting refresh rate to -1 for the index while you bulk, also increase your bulk sizing.

The speed of indexing will also depend on your node setup and sizing.


(jimmy) #3

I assume you meant doing in elasticsearch.yml file

index.refresh_interval: -1
indices.memory.index_buffer_size: 30%


(Magnus B├Ąck) #4

elasticsearch.yml changes require a restart which isn't practical, and setting index.refresh_interval there would (I imagine) only change the default setting for new indexes. The refresh interval setting can be changed on the fly for any index. Not sure about indices.memory.index_buffer_size.


(jimmy) #5

Using the curl bulk apis, I get response as this which seems that it is not indexing it.

curl -s -XPOST localhost:9200/search/picture/_bulk --data-binary @requests; echo

Response:
{"took":2,"errors":false,"items":[{"create":{"_index":"search","_type":"picture","_id":"AU3cN_GH8glW2H0tbkIz","_version":1,"status":201}}]}

which does not make sense as I dont think any of the indexes are populated

My "requests" file contains:
{ "index" : { "_index" : "search", "_type" : "picture" } }
{ "mappings" : "caption" }

Also in symfony2 this is my config.yml file configuration:
fos_elastica:
clients:
default: { host: localhost, port: 9200 }
indexes:
search:
types:
picture:
mappings:
caption: ~
deletedAt: ~
tags:

And this is the cluster health report
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"number_of_pending_tasks" : 0
}

Please advise
Thanks


(system) #6