For the purpose of performance (serverside and browser side) I'm
limiting the number of fields returned, by using
SearchRequestBuilder.addFields(), I then return results in the results
using SearchHit.field().
The strange thing is that most of my fields are being returned even
though I haven't marked them as "stored":true in my explicit mapping.
Currently, I'm going through my list of fields one-by-one, seeing if
it gets returned by SearchHit.field(), and will try putting
"stored":true explicitly on these fields, to see if they suddenly get
returned by the call.
Yes I do have source by default, but for performance reason I don't
want to get source and parse the fields.
This is usually a false economy. Lucene needs to do a disk seek for each
field that it returns, as opposed to just one for the _source field.
Usually the only time it makes sense to use separately stored fields
instead of the _source is when you have very large docs, and you only
want (eg) a last_modified date out of your doc.
In this case, 20 or so fields, I'm in two minds, whether to get the
_source and return it more-or-less as it is to the browser. Or
whether to parse the source into a smaller dataset, giving the browser
less to parse and reducing network time.
Yes I do have source by default, but for performance reason I don't
want to get source and parse the fields.
This is usually a false economy. Lucene needs to do a disk seek for each
field that it returns, as opposed to just one for the _source field.
Usually the only time it makes sense to use separately stored fields
instead of the _source is when you have very large docs, and you only
want (eg) a last_modified date out of your doc.
From the other thread, the fact that something takes 30% cpu out of the
execution time does not reflect if its slow or not :). 30% time taken from
20ms is not that much, for example ;).
As clinton said, most times, it makes sense to just get the _source, with
the exception of very large single fields. Assuming you have no stored
fields, then asking for specific fields will cause them to be parsed and
extracted from the source. This will cause the source to be parsed and for
the fields to be extracted. On the other hand, if you just ask for the
_source, it will be loaded and returned as is all the way back.
In this case, 20 or so fields, I'm in two minds, whether to get the
_source and return it more-or-less as it is to the browser. Or
whether to parse the source into a smaller dataset, giving the browser
less to parse and reducing network time.
Yes I do have source by default, but for performance reason I don't
want to get source and parse the fields.
This is usually a false economy. Lucene needs to do a disk seek for each
field that it returns, as opposed to just one for the _source field.
Usually the only time it makes sense to use separately stored fields
instead of the _source is when you have very large docs, and you only
want (eg) a last_modified date out of your doc.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.