Recreating Google's Ngram Viewer with elasticsearch


I'm looking for tips on how to recreate something like Google's Ngram viewer with elasticsearch. I have a text corpus
of < 500 MB for which this kind of tool would be very valuable.

I've had some success with the shingle token filter and
the date histogram aggregation,
but the results are not ideal: I'd like to get a histogram of word/phrase
frequencies, not a histogram of how many documents the word/phrase occurs

It looks like what I need is some kind of combination of shingles, term
vectors and the
date histogram aggregation, but I'm not sure how to proceed. I can improve
my current approach by breaking the corpus into smaller pieces, i.e. make
my documents be paragraphs instead of chapters. But what I really want is a
"shingle frequency date histogram".

Is this something that can be accomplished with elasticsearch?


You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit