I have to index huge volume of data frequently for research purpose.
60,000,000 docs are one of my recent task for indexing. Fortunately, the
size of docs is very small, so the total size of bulk index file for 60 M
docs is only 11 G.
I used the following command for Solr to prevent memory error and high
performance. And it was good.
I have to index huge volume of data frequently for research purpose.
60,000,000 docs are one of my recent task for indexing. Fortunately, the
size of docs is very small, so the total size of bulk index file for 60 M
docs is only 11 G.
I used the following command for Solr to prevent memory error and high
performance. And it was good.
For this I wrote a multithreaded writer which reads a file, bundle n
(usually 500) documents, queue the chunks which are picked up by the
writer threads which bulk index over http in round robin over all my
cluster nodes.
I have to index huge volume of data frequently for research purpose.
60,000,000 docs are one of my recent task for indexing. Fortunately, the
size of docs is very small, so the total size of bulk index file for 60 M
docs is only 11 G.
I used the following command for Solr to prevent memory error and high
performance. And it was good.
For this I wrote a multithreaded writer which reads a file, bundle n
(usually 500) documents, queue the chunks which are picked up by the
writer threads which bulk index over http in round robin over all my
cluster nodes.
Is it opensourced somewhere ?
No its not, sorry... this code is just a part of another project. It
wouldn't be a bad idea to make this piece generic and opensource it.
It's in Ruby. If you still have interest, I'll see what I can do.
No its not, sorry... this code is just a part of another project. It
wouldn't be a bad idea to make this piece generic and opensource it.
It's in Ruby. If you still have interest, I'll see what I can do.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.