Hello,
I want to be able to store millions of email addresses in ES and check if an email matches. The problem I'm facing is which the best model is.
Solution 1
Extract the username and email domain. This can be done in 2 ways:
N indexes -> 1 type (Solution 1A)
The index is the email domain and I store just the username.
When I want to check if an email address is stored in ES, the search would be fast and quick within an index. If I have foobar@gmail.com, I will have to match 'foobar' in 'gmail.com' index.
Two big problems:
- I do not think ES is scalable with many indexes
- Waste of space if the index is too small
So I would assume this solution is really bad.
1 big index -> N types (solution 1B)
The type is the email domain and like solution 1A we store the username only.
Although there is less overhead compared to previous solution, it can lead to long search time and it would be good for "small" datasets.
What are the implications if I split the index in multiple ones and use aliases?
Solution 2
Store the email address as it is with the standard tokeniser. Is this solution the same as 1B?
According to ES documentation, the type is simply an additional field which ES applies a filter on.
What are your thoughts?
Thanks