What ES mapping relationship should I use in this case?


#1

I am trying to create an ElasticSearch mapping to index an Email. I've read the ES documentation regarding mapping relationships. However I am confused as to what to use to represent recipients?

Each email recipients is something simple consisting of displayName and emailAddress

The ES documentation gives Strong warning to use Parent-Child relationship ONLY when it is really needed and all other options are exhausted. In particular it says to use Parent-Child relationship for a case when there are a few parent with many children.

Most emails have few recipients (less than 50) so my first instinct was to use "Nested-Object". However once in a while there are those.. "all hands" email where the recipients could go up as many as thousands of recipients.

So my dilemma is - my general case seems ideal for Nested-Object, however my edge case seems ideal for Parent-Child relationship. So if there is an experience ElasticSearch users out there that have been through this, I would love to know what mapping relationship you used for this and the reasoning too.


(Mark Walkom) #2

To me, it doesn't really make sense to relate things in this manner. Just store each email as a single doc (aka time based index structures), then analyse.

What are you trying to do with the data?


#3

Thanks for replying!

I am trying to make the data searchable and I am hoping I could search by recipient names. For example search for all mail where one of the recipients name contains "john" and "doe"

Each email is already pre-processed and have the text extract. The email data that I will be receiving will look like this:


Title=email1
Subject=this is a test email

SenderDisplayName=me
SenderEmailAddress=me@mycompany.com

RecipientDisplayName=Mr. John
RecipientDisplayAddress=mrB@hiscompany.com

RecipientDisplayName=Mr. Doe
RecipientDisplayAddress=mrC@hiscompany.com

Content=This is the content of the email


Note* mostly there are only few recipients but sometime there are thousands of them.

So I was hoping to ingest it like so, so that I could search them while keeping the relation between display name and email address intact.

{
"Title" : "email1",
"Subject" : "this is a test email",
"SenderDisplayName" : "me",
"SenderEmailAddress" : "me@mycompany.com"
"Recipients" : {
"Recipient1" : {
"DisplayName" : "Mr. John",
"EmailAddress" : "mrB@hiscompany.com"
},
"Recipient2" : {
"DisplayName" : "Mr. Doe",
"EmailAddress" : "mrC@hiscompany.com"
}
....
}
"Content" : "This is the content of the email"
}

In this case if I search for all email where the recipients name contain "John" and "Doe" then the sample above should not be returned as a match.

I hope this gives more clarification


(Christian Dahlqvist) #4

Parent-child is very useful when you have parents with large number of children and the parent is updated frequently, as it allows just the parent to be reindexed. As you are modelling e-mails I assume you are not going to update them, and since you want to be able to search on combinations of DisplayName and EmailAddress I would recommend using a nested structure.


#5

Thanks for the advice Christian.

Nested-Object is great for the general cases however once in a while I get those "all-hands" email where the recipients are essentially everyone in the company which is up in the thousands (or even tens of thousands).

Based on my reading Nested-Object by default is only up to 1000, so I am afraid that Nested-Object won't be able to handle this edge case.

Here is a link to the limitation: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings

look for:
index.mapping.total_fields.limit


(Christian Dahlqvist) #6

The limit you specified is for the number of fields in an index. If you model your documents with an array of recipients as follows I think you should be able to avoid this:

{
  "Title" : "email1", 
  "Subject" : "this is a test email",
  "SenderDisplayName" : "me", 
  "SenderEmailAddress" : "me@mycompany.com" 
  "Recipients" : [
    {
      "DisplayName" : "Mr. John",
      "EmailAddress" : "mrB@hiscompany.com"
    }, 
    {
      "DisplayName" : "Mr. Doe", 
      "EmailAddress" : "mrC@hiscompany.com"
    } 
    ....
  ], 
  "Content" : "This is the content of the email" 
}

#7

Thanks this is certainly encouraging. However, just to be extra sure, there is also this setting: index.mapping.nested_fields.limit

Does the limit means at most I could only have 50 entries in the "Recipients" or does it mean I could have another 49 field like "Recipients" ?

I feel so stupid now seems like I was misunderstanding the documentation :cry:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.