A Question on whether to use a Nested Datatype

Hello All.

I wanted to ask for peoples thoughts around whether to use nested datatypes in a particular scenario.

The scenario:

I am wanting to store into an index the contents of files and each file contains references to email addresses (along with other textual content). I am (using regex) extracting all the email addresses as I want to perform aggregation queries on these addresses. For each email address I am storing the actual email address in a field, the domain in a field and the pre @ name (e.g. "bob" from "bob@email.com") in a field (so the email object has 3 fields).

Now there may be 1000s of email addresses in a file I am indexing. I can store the email address details into a nested datatype to the index which the document is being indexed however, I have noted that elastic does place limitations on the number of nested objects that can be stored into a single document (index.mapping.nested_objects.limit). Alternatively, I can create a separate index to store the email address objects and include a field which stores the ID of the document which contains the file I indexed. However, it is my understanding that the nested datatype is in essence already doing something very similar behind the scenes.

The question:

So my question (after all that), do I go with the nested datatype approach and simply increase the limit (index.mapping.nested_objects.limit) to something "crazy" or do I go with the manual approach of managing a separate index?

Thanks in advance for the advice.

Matt

Hello, Matthew.
It sounds like you can not bother about nested. Those three field should be sufficient for the associated emails.

Sorry Mikhail - just confirming. You are saying that I am better to NOT use NESTED DATATYPES and to instead go with a separate index that I manage?

I’ve shared this flowchart before to help with the decision process for nested:

1 Like

better to NOT use NESTED DATATYPES and get alone with three fields.