We have a field called signature.full which is unanalyzed, every
document has a single string which represents one possible term.
We are displaying a page that shows a table of unique signatures,
ordered by frequency. Unfortunately, there is no good way that we
have found to see how many unique signatures there are other than
doing a facet for an absurdly high range.
No, there isn't another way to do it. Sadly, distinct is something that is
hard to do in a distributed env for large result sets. We can easily count
the distinct count per shard, but to return a correct number across shards
menas we need to send all the (distinct) values back to the the
"coordinator" and compute the distinct count there... .
We have a field called signature.full which is unanalyzed, every
document has a single string which represents one possible term.
We are displaying a page that shows a table of unique signatures,
ordered by frequency. Unfortunately, there is no good way that we
have found to see how many unique signatures there are other than
doing a facet for an absurdly high range.
If we want to implement a reduce function on the large result set,
will you have any suggestions for a clean and elegant solution
(something similar to what MongoDB does), or does ES have this feature
on the roadmap?
No, there isn't another way to do it. Sadly, distinct is something that is
hard to do in a distributed env for large result sets. We can easily count
the distinct count per shard, but to return a correct number across shards
menas we need to send all the (distinct) values back to the the
"coordinator" and compute the distinct count there... .
We have a field called signature.full which is unanalyzed, every
document has a single string which represents one possible term.
We are displaying a page that shows a table of unique signatures,
ordered by frequency. Unfortunately, there is no good way that we
have found to see how many unique signatures there are other than
doing a facet for an absurdly high range.
If we want to implement a reduce function on the large result set,
will you have any suggestions for a clean and elegant solution
(something similar to what MongoDB does), or does ES have this feature
on the roadmap?
No, there isn't another way to do it. Sadly, distinct is something that
is
hard to do in a distributed env for large result sets. We can easily
count
the distinct count per shard, but to return a correct number across
shards
menas we need to send all the (distinct) values back to the the
"coordinator" and compute the distinct count there... .
We have a field called signature.full which is unanalyzed, every
document has a single string which represents one possible term.
We are displaying a page that shows a table of unique signatures,
ordered by frequency. Unfortunately, there is no good way that we
have found to see how many unique signatures there are other than
doing a facet for an absurdly high range.
If we want to implement a reduce function on the large result set,
will you have any suggestions for a clean and elegant solution
(something similar to what MongoDB does), or does ES have this feature
on the roadmap?
No, there isn't another way to do it. Sadly, distinct is something that
is
hard to do in a distributed env for large result sets. We can easily
count
the distinct count per shard, but to return a correct number across
shards
menas we need to send all the (distinct) values back to the the
"coordinator" and compute the distinct count there... .
We have a field called signature.full which is unanalyzed, every
document has a single string which represents one possible term.
We are displaying a page that shows a table of unique signatures,
ordered by frequency. Unfortunately, there is no good way that we
have found to see how many unique signatures there are other than
doing a facet for an absurdly high range.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.