Text classification with ES

In some more experimental runs that I made, I used different values for the min_doc_freq hyper-parameter for the MLT query and noticed that the accuracy improved. On a much smaller dataset, that I cannot make public, the accuracy on the training dataset improved from 60% to 92% when min_doc_freq was moved from default value 5 down to 1. The test set accuracy was 72% I think this is already commendable for the fact that it is for free and is very fast to set up. Great work guys!

Still, any insights, thoughts ?