Appropriate sample and certainty size


(Allie Yang Yang) #1

Hi i notice that the default sample and certainty sizes are 2000 and 3.
May I ask is there any rule of thumb for setting the numbers?

And "Terms are identified from samples of the most relevant documents. Bigger is not necessarily better - can be slower and less relevant" means that number of records doesn't dictate the there would be a link. The link is somehow calculated in other ways. Am I right?

(Mark Harwood) #2

It does depend on your query and data but is not normally critical.
I have a video that visualizes the effects of sampling on signal quality here:

(system) #3