I'm looking for tips on how to recreate something like Google's Ngram viewer
https://books.google.com/ngrams with elasticsearch. I have a text corpus
of < 500 MB for which this kind of tool would be very valuable.
I've had some success with the shingle token filter
the date histogram aggregation
but the results are not ideal: I'd like to get a histogram of word/phrase
frequencies, not a histogram of how many documents the word/phrase occurs
It looks like what I need is some kind of combination of shingles, term
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html and the
date histogram aggregation, but I'm not sure how to proceed. I can improve
my current approach by breaking the corpus into smaller pieces, i.e. make
my documents be paragraphs instead of chapters. But what I really want is a
"shingle frequency date histogram".
Is this something that can be accomplished with elasticsearch?
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4b37f0a1-4611-4260-85fb-36b4d67c6076%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.