Hello All.
I wanted to ask for peoples thoughts around whether to use nested datatypes in a particular scenario.
The scenario:
I am wanting to store into an index the contents of files and each file contains references to email addresses (along with other textual content). I am (using regex) extracting all the email addresses as I want to perform aggregation queries on these addresses. For each email address I am storing the actual email address in a field, the domain in a field and the pre @ name (e.g. "bob" from "bob@email.com") in a field (so the email object has 3 fields).
Now there may be 1000s of email addresses in a file I am indexing. I can store the email address details into a nested datatype to the index which the document is being indexed however, I have noted that elastic does place limitations on the number of nested objects that can be stored into a single document (index.mapping.nested_objects.limit). Alternatively, I can create a separate index to store the email address objects and include a field which stores the ID of the document which contains the file I indexed. However, it is my understanding that the nested datatype is in essence already doing something very similar behind the scenes.
The question:
So my question (after all that), do I go with the nested datatype approach and simply increase the limit (index.mapping.nested_objects.limit) to something "crazy" or do I go with the manual approach of managing a separate index?
Thanks in advance for the advice.
Matt