we have problem statement. We want to load around 50 million email addresses into elastic search in the initial load(probably chunking into several splits). later on we will be adding in few thousands every day. Email addresses are the only entity in the document. Can you please suggest the number of primary shards and replica shards that we need to host? Also what would be appropriate id and document structure?
ex -
contacts/email/hello@gmail.com
{"email":"hello@gmail.com"}
or
contacts/email
{"email":"hello@gmail.com"}
These email addresses existence will be checked by realtime traffic.
we are thinking that the first approach of having the email address as id will be help faster retrieval than the second one. Any suggestions?