I struggled with coming up with the subject line for this one, so please
bare with me.
We have when searching, historically returned all result IDs from our
Lucene based search infrastructure. Yes, NOT a good idea we have copped
some interesting performance characteristics and had to do some heavy
internal Lucene customization/forking to make it work, but let me explain
If we have a feature to perform a search, we generally only need to display:
- Result 1-X based on a page size
- Perhaps hold a few pages of results, and then show a "More" button or
something (hopefully no-one is silly to want to go to result # million..)
That's fine, usual use case. Then when a user may wish to export these
results to some external mechanism (classic: Excel), we do need to
materialize the whole result set. Now, the _scan API is there, but doesn't
support the sorting, and we need to maintain the sort for the export to
Excel. The scrolling mechanism allows us to do that, but I'm wondering
about the consistency level between each scroll iteration.
The right way to do this would be to only materialize the whole resultset
when you need to do the Excel, HOWEVER, this introduces a latency between
the original search result, and the one that is done for the export,
resulting in differing results between the two if there's a high update rate
What I would like to be able to do is actually compute a Hash of the result
for the 'top results' based on the id's and then when exporting, compute the
hash again, and if the hash is different you know you have a difference
between the original and the export. I'd like to be able to show our
customers a little note "Warning: These results may be different from when
you first searched".
Perhaps I need to convince the Product Owners to just suck this one up and
we'll assume that any export the results may have differed. I guess if we
can determine that they haven't, the customer can have more confidence. In
our cases this can be important for their decision making (won't bore you
with the details).
I was thinking, that maybe, the mvel script stuff could be used to compute
some sort of hash and returned with the results.. ? Is that practical? Is
it crazy? Is there another smarter way of handling that ?