While trying to resolve an issue with querying on multi_field fields
(https://groups.google.com/group/elasticsearch/t/1246b63c8a867d), I
implemented a workaround by 'flattening' the sub-fields in the multi-
field into top-level fields. EG: Instead of
{
"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {"type" : "string", "index" :
"analyzed"},
"untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}
}
}
My question is that does it actually make a difference in terms of
underlying storage efficiency in elasticsearch or runtime query
performance if I used the multi_field representation or the 2 separate
fields representation? EG: Does ES perform any optimizations to make
multi_field preferred if it is semantically appropriate? And, if the
answer is no (there is no difference in performance/efficiency), then
under what circumstances should multi_field be used?
The main difference with what you specified, with two explicit mappigns for
name and name_untouched, is that you need to repeat the "name" value twice
in the json, which makes it bigger. The mulit_field type option reuses the
same name value in the json.
While trying to resolve an issue with querying on multi_field fields
(https://groups.google.com/group/elasticsearch/t/1246b63c8a867d), I
implemented a workaround by 'flattening' the sub-fields in the multi-
field into top-level fields. EG: Instead of
{
"tweet" : {
"properties" : {
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {"type" : "string", "index" :
"analyzed"},
"untouched" : {"type" : "string", "index" :
"not_analyzed"}
}
}
}
}
}
My question is that does it actually make a difference in terms of
underlying storage efficiency in elasticsearch or runtime query
performance if I used the multi_field representation or the 2 separate
fields representation? EG: Does ES perform any optimizations to make
multi_field preferred if it is semantically appropriate? And, if the
answer is no (there is no difference in performance/efficiency), then
under what circumstances should multi_field be used?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.