At the moment we were developing only on 6-8 records, which I accept is hardly anything but we needed to incrementally build and understand the scoring system when we add/remove/edit records.
Going forward we expect it to be approx 147,000 documents.
Thanks for your reply Mark, much appreciated. Would I be on the right track thinking there might be a performance impact with dfs-query-then-fetch? I guess it's a trade-off we would have to consider but for now I think it's easier for us to have a single shard.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.