Multi_field field versus creating 2 separate fields?


(datadev) #1

While trying to resolve an issue with querying on multi_field fields
(https://groups.google.com/group/elasticsearch/t/1246b63c8a867d), I
implemented a workaround by 'flattening' the sub-fields in the multi-
field into top-level fields. EG: Instead of
{
"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {"type" : "string", "index" :
"analyzed"},
"untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}
}
}

I now have 2 top-level fields:

{
"tweet" : {
"properties" : {
"name" : {"type" : "string", "index" : "analyzed"},
"name_untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}

My question is that does it actually make a difference in terms of
underlying storage efficiency in elasticsearch or runtime query
performance if I used the multi_field representation or the 2 separate
fields representation? EG: Does ES perform any optimizations to make
multi_field preferred if it is semantically appropriate? And, if the
answer is no (there is no difference in performance/efficiency), then
under what circumstances should multi_field be used?


(Shay Banon) #2

The main difference with what you specified, with two explicit mappigns for
name and name_untouched, is that you need to repeat the "name" value twice
in the json, which makes it bigger. The mulit_field type option reuses the
same name value in the json.

On Tue, Oct 18, 2011 at 8:53 AM, datadev nji@adinfocenter.com wrote:

While trying to resolve an issue with querying on multi_field fields
(https://groups.google.com/group/elasticsearch/t/1246b63c8a867d), I
implemented a workaround by 'flattening' the sub-fields in the multi-
field into top-level fields. EG: Instead of
{
"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {"type" : "string", "index" :
"analyzed"},
"untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}
}
}

I now have 2 top-level fields:

{
"tweet" : {
"properties" : {
"name" : {"type" : "string", "index" : "analyzed"},
"name_untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}

My question is that does it actually make a difference in terms of
underlying storage efficiency in elasticsearch or runtime query
performance if I used the multi_field representation or the 2 separate
fields representation? EG: Does ES perform any optimizations to make
multi_field preferred if it is semantically appropriate? And, if the
answer is no (there is no difference in performance/efficiency), then
under what circumstances should multi_field be used?


(system) #3