Configuration of elasticsearch to index 300 Million documents

hitesh_shekhada · January 23, 2015, 11:37am

Hi,

My goal is to index 300 million documents/products from MS SQL server
database.
To get all documents I need to join 14 different tables.
Total data size of 300 million documents is 300GB
There 70 fields in one document
One document is of size 0.8 kb
Need to update value of 10 fields (out of 70) for almost 90 million
documents every night.

I need to know ...

Does anybody has indexed such a large amount of data to elasticsearch
server?
How many cluster/nodes I need for handling it.

Please let me know if someone has used elasticsearch for this amount of
data.

Thanks,
Hitesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · January 23, 2015, 10:24pm

1 - Yes, that's not a lot of data for ES
2 - Depends, you should be able to do that on a single node with ~16GB
heap, but you should test yourself.

On 23 January 2015 at 22:37, hitesh shekhada shekhada@gmail.com wrote:

Hi,

My goal is to index 300 million documents/products from MS SQL
server database.

To get all documents I need to join 14 different tables.

Total data size of 300 million documents is 300GB

There 70 fields in one document

One document is of size 0.8 kb

Need to update value of 10 fields (out of 70) for almost 90 million
documents every night.

I need to know ...

Does anybody has indexed such a large amount of data to
elasticsearch server?

How many cluster/nodes I need for handling it.

Please let me know if someone has used elasticsearch for this amount of
data.

Thanks,
Hitesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-Y28Sf_MnwDUb7iZp1MERxEWPehO4CRzmrSfBCHidumA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

vaniaravinda · July 7, 2015, 5:11am

Hi @warkolm,

I want to do indexing on millions of records in SQL Server table. Can you please suggest me which approach I can follow and if there is any updates in the table how can I handle those records also.

Can you please suggest me as soon as possible. Your reply will be more helpful.

Thanks,
Vani Aravinda

Jason_Wee · July 7, 2015, 7:35am

concurred with warkolm statement, the requirements specified should be able to handle by elasticsearch. at least in the company i work for, we have five production nodes with 340m docs with index size of 863G.

hth