I have an index containing around 1 million HTML documents. The index
mapping has following fields
file: Holding the actual html document
*url : *Holds the source url of the html document
*meta-data : *Some meta data associated with the document
I want to do the following :
Query the index for a result set of size 100.
Prune the result set so that it contains at most 2 results from a
particular domain.
For the pruning part, I was wondering whether I can use script filter to
perform this function. Is it possible to do this using script filter ? If
yes then how ? Is there any other option to do what I want to do ?
My first thought would be a custom facet/collector, but that would require
a fair amount of code. Perhaps Igor's facet script could help? I have never
used it.
I have an index containing around 1 million HTML documents. The index
mapping has following fields
file: Holding the actual html document
*url : *Holds the source url of the html document
*meta-data : *Some meta data associated with the document
I want to do the following :
Query the index for a result set of size 100.
Prune the result set so that it contains at most 2 results from a
particular domain.
For the pruning part, I was wondering whether I can use script filter to
perform this function. Is it possible to do this using script filter ? If
yes then how ? Is there any other option to do what I want to do ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.