Need help for a mapping definition


#1

Hi,

I'm using Elasticsearch 1.5, here is a sample of what I'm trying to index:

{
"Id":"000000a6aOO4T2iA",
"CreatedDate":"2016-01-12T14:13:33.285Z",
"ModifiedDate":"2016-01-12T14:13:33.285Z",
"Segment":"not-applicable",
"TenantId":"101",
"Attributes":{
"PhoneNumber_1452608013247":"{"Id":"PhoneNumber_1452608013247","Name":"PhoneNumber","StrValue":"1452608013247","Description":"a nice PhoneNumber","MimeType":null}",
"FirstName_Shara":"{"Id":"FirstName_Shara","Name":"FirstName","StrValue":"Shara","Description":"a nice FirstName","MimeType":null}",
"LastName_Conor":"{"Id":"LastName_Conor","Name":"LastName","StrValue":"Conor","Description":"a nice LastName","MimeType":null}",
"PhoneNumber_145260801324722":"{"Id":"PhoneNumber_145260801324722","Name":"PhoneNumber","StrValue":"145260801324722","Description":"a nice PhoneNumber","MimeType":null}"
},
"PrimaryAttributes":{
"FirstName":"Shara",
"PhoneNumber":"1452608013247",
"LastName":"Conor"
},
"IndexationDate":"2016-01-12T14:13:33.309Z"
}

The Attribute object contains, keys that are made of type and StrValue, then inside the value there is a JSON doc that I'd like to map.

The goal is to be able to search either in PrimaryAttributes or in all Attribute StrValues that includes primary and non primary values. Types such as LastName or PhoneNumber are configurable by the end-user so I can't assume any value there.

Here is the resulting Lucene document I'd expect:

{
"Id":"000000a6aOO4T2iA",
"CreatedDate":"2016-01-12T14:13:33.285Z",
"ModifiedDate":"2016-01-12T14:13:33.285Z",
"Segment":"not-applicable",
"TenantId":"101",

"PrimaryAttributes.FirstName":"Shara",
"PrimaryAttributes.PhoneNumber":"1452608013247",
"PrimaryAttributes.LastName":"Conor",
"PhoneNumber":"1452608013247",
"PhoneNumber":"145260801324722",
"FirstName":"Shara",
"LastName":"Conor",

"IndexationDate":"2016-01-12T14:13:33.309Z"
}

I find very difficult to debug mappings, I'm using break points in Elasticsearch code, is there a better way?

I tried many different mapping, here is an example, but it does not work:

{
"Contact":
{
"dynamic": "true",
"properties":
{
"Id":
{
"type": "string",
"analyzer": "keyword"
},

  	"TenantId": 
  	{
  		"type": "long",
  		"analyzer": "keyword"
  	},
  	"Segment": 
  	{
  		"type": "string",
  		"analyzer": "keyword"
  	},
  	"CreatedDate": 
  	{
  		"type": "date",
  		"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
  		"analyzer": "standard"
  	},
  	"ModifiedDate": 
  	{
  		"type": "date",
  		"format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
  		"analyzer": "standard"
  	},
  	"PrimaryAttributes": 
  	{
  		"dynamic": "false",
  		"type": "nested",
  		"fields": 
  		{
  			"EmailAddress": 
  			{
  				"type": "string",
  				"index": "analyzed",
  				"analyzer": "standard"
  			},
  			"{name}": 
  			{
  				"type": "string",
  				"index": "analyzed",
  				"analyzer": "standard"
  			}
  		}
  	}
  },
  "dynamic_templates": 
  [
  	{
  		"ContactAttributes": 
  		{
  			"path_match": "Attributes.*",
  			"mapping": 
  			{
  				"type": "nested",
  				"properties": 
  				{
  					"StrValue": 
  					{
  						"type": "string",
  						"index": "analyzed",
  						"copy_to": "{name}.Name"
  					},
  					"{name}": 
  					{
  						"index": "no"
  					}
  				}
  			}
  		}
  	}
  ]

}
}

Thank you for the help!
JHB


(Mark Walkom) #2

Does not work how/why?


(Christian Dahlqvist) #3

As far as I can see there are a number of issues both with your data and your mapping:

Having this type of dynamic fields is bad practice and will lead to mapping explosion as you scale out. Please see this blog post for further details.

You value in the document is a string, so will not work if the field is defined as long. Either remove the quotes in the document, or perhaps change this into a not_analyzed string. If you choose to keep it to long, also remove the analyser specification and this only applies to string fields.

Date fields are not analysed either, so remove the analyser specification here as well.

I have not tested tried it out, so there may very well be other issues as well. I would recommend reading the documentation on mapping as well as the sections on mapping in Elasticsearch: the Definitive Guide.


#4

I fixed few issues, first one is that the JSON value in my map was escaped. I can now go through the values correctly with the mapper. I fixed the wrong analysers and long type. I checked the provided links but I already read them.

Here is the mapping I currently have:

{
  "ucs_index" : {
    "mappings" : {
      "Contact" : {
        "dynamic" : "true",
        "dynamic_date_formats" : [ "yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZZ" ],
        "dynamic_templates" : [ {
          "nested_ContactAttributes" : {
            "index" : "no",
            "mapping" : {
              "type" : "nested"
            },
            "match" : "Attributes"
          }
        }, {
          "ContactAttributes" : {
            "mapping" : {
              "type" : "multi_field",
              "fields" : {
                "Name" : {
                  "type" : "string",
                  "index" : "no"
                },
                "StrValue" : {
                  "type" : "string",
                  "index" : "analyzed",
                  "copy_to" : "{Name}"
                },
                "{name}" : {
                  "type" : "string",
                  "index" : "no"
                }
              }
            },
            "match_mapping_type" : "string",
            "path_match" : "Attributes.*"
          }
        } ],
        "properties" : {
          "CreatedDate" : {
            "type" : "date",
            "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          },
          "Id" : {
            "type" : "string",
            "analyzer" : "keyword"
          },
          "IndexationDate" : {
            "type" : "date",
            "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          },
          "ModifiedDate" : {
            "type" : "date",
            "format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          },
          "PrimaryAttributes" : {
            "dynamic" : "false",
            "properties" : {
              "EmailAddress" : {
                "type" : "string",
                "analyzer" : "standard",
                "copy_to" : [ "{name}" ]
              },
              "{name}" : {
                "type" : "string",
                "analyzer" : "standard",
                "copy_to" : [ "{name}" ]
              }
            }
          },
          "Segment" : {
            "type" : "string",
            "analyzer" : "keyword"
          },
          "TenantId" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
      }
    }
  }
}

If I focus on the problematic part:

"Attributes":{
	"PhoneNumber_1452608013247": {
			"Id":"PhoneNumber_1452608013247",
			"Name":"PhoneNumber",
			"StrValue":"1452608013247",
			"Description":"a nice PhoneNumber",
			"MimeType":null
		}
}

I don't want to index the content of "Attributes" beside the value. I want to be able to copy the value stored in "StrValue" to the root of the document at the value of the "Name". For the above doc:
PhoneNumber:1452608013247

Something like:
"copy_to" : "..{Name}"

I hope I won't have to use Groovy scripting, I'm worried by the performance.

Thanks


#5

I was able to copy the values to a static string by replacing [Name] by anything else but that's not what I want. I also enabled groovy but it does not seem to work either. Below mapping is accepted, I can see that at some point while debugging in Eclipse it seems to interpret [Name] but does not find a copy_to instruction, whereas it finds it with a static string.

{ 
	"Contact" : { 
		"dynamic" : "true",
		"properties" : { 
			"Id" : { 
				"type" : "string",
				"analyzer" : "keyword"
			},
			"TenantId" : { 
				"type" : "string",
				"index" : "not_analyzed"
			},
			"Segment" : { 
				"type" : "string",
				"analyzer" : "keyword"
			},
			"CreatedDate" : { 
				"type" : "date",
				"format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
			},
			"ModifiedDate" : { 
				"type" : "date",
				"format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
			},
			"IndexationDate" : { 
				"type" : "date",
				"format" : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
			},
			"PrimaryAttributes" : { 
				"dynamic" : "false",
				"type" : "object",
				"properties" : { 
					"EmailAddress" : { 
						"type" : "string",
						"index" : "analyzed",
						"analyzer" : "standard",
						"copy_to" : [ 
							"{name}"
						]
					},
					"{name}" : { 
						"type" : "string",
						"index" : "analyzed",
						"analyzer" : "standard",
						"copy_to" : [ 
							"{name}"
						]
					}
				}
			}
		},
		"dynamic_templates" : [ 
			{ 
				"nested_ContactAttributes" : { 
					"match" : "Attributes",
					"index" : "no",
					"mapping" : { 
						"type" : "nested"
					}
				}
			},
			{ 
				"ContactAttributes" : { 
					"path_match" : "Attributes.*",
					"match_mapping_type" : "string",
					"mapping" : { 
						"type" : "multi_field",
						"transform": {
        					"script": "ctx._source[typeName] = ctx._source['StrValue'] ctx",
                          	"params" : {
                				"typeName" : "ctx._source['Name']"
				            },
            				"lang": "groovy"
      					},                      
						"fields" : {                         
							"StrValue" : { 
                              	"match": "StrValue",
								"type" : "string",
								"index" : "analyzed",
                              	"copy_to" : "[Name]"
							}
						}
					}
				}
			}
		]
	}
}

#6

To be clear I'd like to be able to search for example:
PhoneNumber:1452608013247
or
PhoneNumber:145260801324722
or
PrimaryAttributes.PhoneNumber:1452608013247


(system) #7