WordCloud in Elasticsearch

I looking to create a wordcloud in Jupyter (IPython Notebook) using either
python or javascript. I have a collection of Presidential speeches from
the millercenter.org loading into ES. I'm able to execute a termvector
query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8adafbf-b7fa-492d-ad20-b1c5b0fc0941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using either
python or javascript. I have a collection of Presidential speeches from
the millercenter.org loading into ES. I'm able to execute a termvector
query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f41b71b-aa6e-4399-9388-b7fc9d352fc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Alfredo, My goal is to use the features in ES to create a wordcloud as
easy as possible. The termvector or significant terms query seem to be the
most useful.

A visualization of the 'significant' words is all I'm after.

On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using
either python or javascript. I have a collection of Presidential speeches
from the millercenter.org loading into ES. I'm able to execute a
termvector query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2857e82-6d79-41b4-8d19-6e3f25ede0e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

A visualization of the 'significant' words is all I'm after.

The main question then is "significant compared to what?".

Straight popularity counts (e.g. terms agg) will just tell you the term
"the" is very popular.
To use significant_terms you need to provide a foreground set and a
background set to compare for differences.
Examples therefore might be:
*What idioms do presidents use? *: All presidential speeches Vs normal
English (e.g. a sample of English Wikipedia content)
What is different about Obama as a president? Obama speeches Vs all
other presidential speeches
What is Obama talking about now?: Obama speeches 2015 Vs all prior
Obama speeches.

On Friday, April 24, 2015 at 2:06:19 PM UTC+1, Jeff Fogarty wrote:

Hi Alfredo, My goal is to use the features in ES to create a wordcloud as
easy as possible. The termvector or significant terms query seem to be the
most useful.

A visualization of the 'significant' words is all I'm after.

On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using
either python or javascript. I have a collection of Presidential speeches
from the millercenter.org loading into ES. I'm able to execute a
termvector query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8012a91b-6202-4385-9dcb-0860be24a72e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.