Configuration of elasticsearch to index 300 Million documents


(hitesh shekhada) #1

Hi,

  1. My goal is to index 300 million documents/products from MS SQL server
    database.
  2. To get all documents I need to join 14 different tables.
  3. Total data size of 300 million documents is 300GB
  4. There 70 fields in one document
  5. One document is of size 0.8 kb
  6. Need to update value of 10 fields (out of 70) for almost 90 million
    documents every night.

I need to know ...

  1. Does anybody has indexed such a large amount of data to elasticsearch
    server?
  2. How many cluster/nodes I need for handling it.

Please let me know if someone has used elasticsearch for this amount of
data.

Thanks,
Hitesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

1 - Yes, that's not a lot of data for ES :slight_smile:
2 - Depends, you should be able to do that on a single node with ~16GB
heap, but you should test yourself.

On 23 January 2015 at 22:37, hitesh shekhada shekhada@gmail.com wrote:

Hi,

  1. My goal is to index 300 million documents/products from MS SQL
    server database.
  2. To get all documents I need to join 14 different tables.
  3. Total data size of 300 million documents is 300GB
  4. There 70 fields in one document
  5. One document is of size 0.8 kb
  6. Need to update value of 10 fields (out of 70) for almost 90 million
    documents every night.

I need to know ...

  1. Does anybody has indexed such a large amount of data to
    elasticsearch server?
  2. How many cluster/nodes I need for handling it.

Please let me know if someone has used elasticsearch for this amount of
data.

Thanks,
Hitesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/acc7bbd4-7abc-4451-bfd5-b21ab725e02e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-Y28Sf_MnwDUb7iZp1MERxEWPehO4CRzmrSfBCHidumA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Vani Aravinda) #3

Hi @warkolm,

I want to do indexing on millions of records in SQL Server table. Can you please suggest me which approach I can follow and if there is any updates in the table how can I handle those records also.

Can you please suggest me as soon as possible. Your reply will be more helpful.

Thanks,
Vani Aravinda


(Jason Wee) #4

concurred with warkolm statement, the requirements specified should be able to handle by elasticsearch. at least in the company i work for, we have five production nodes with 340m docs with index size of 863G.

hth


(system) #5