WordCloud in Elasticsearch

Jeff_Fogarty · April 23, 2015, 1:08pm

I looking to create a wordcloud in Jupyter (IPython Notebook) using either
python or javascript. I have a collection of Presidential speeches from
the millercenter.org loading into ES. I'm able to execute a termvector
query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8adafbf-b7fa-492d-ad20-b1c5b0fc0941%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

seralf · April 23, 2015, 3:26pm

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using either
python or javascript. I have a collection of Presidential speeches from
the millercenter.org loading into ES. I'm able to execute a termvector
query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f41b71b-aa6e-4399-9388-b7fc9d352fc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff_Fogarty · April 24, 2015, 1:06pm

Hi Alfredo, My goal is to use the features in ES to create a wordcloud as
easy as possible. The termvector or significant terms query seem to be the
most useful.

A visualization of the 'significant' words is all I'm after.

On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using
either python or javascript. I have a collection of Presidential speeches
from the millercenter.org loading into ES. I'm able to execute a
termvector query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2857e82-6d79-41b4-8d19-6e3f25ede0e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark_Harwood · April 24, 2015, 1:26pm

A visualization of the 'significant' words is all I'm after.

The main question then is "significant compared to what?".

Straight popularity counts (e.g. terms agg) will just tell you the term
"the" is very popular.
To use significant_terms you need to provide a foreground set and a
background set to compare for differences.
Examples therefore might be:
*What idioms do presidents use? *: All presidential speeches Vs normal
English (e.g. a sample of English Wikipedia content)
What is different about Obama as a president? Obama speeches Vs all
other presidential speeches
What is Obama talking about now?: Obama speeches 2015 Vs all prior
Obama speeches.

On Friday, April 24, 2015 at 2:06:19 PM UTC+1, Jeff Fogarty wrote:

Hi Alfredo, My goal is to use the features in ES to create a wordcloud as
easy as possible. The termvector or significant terms query seem to be the
most useful.

A visualization of the 'significant' words is all I'm after.

On Thursday, April 23, 2015 at 10:26:14 AM UTC-5, Alfredo Serafini wrote:

Hi Jeff

IMHO a wordcloud visualization is simple to construct over facets, so if
you have aggregations which counts how many documents you have for every
term, this is probably the most simple way to construct it.
If you want to use the term vectors it's important to understand what you
want to describe, in particular.

What do you want to visualize? What do you expect emerging from data?

Il giorno giovedì 23 aprile 2015 15:08:36 UTC+2, Jeff Fogarty ha scritto:

I looking to create a wordcloud in Jupyter (IPython Notebook) using
either python or javascript. I have a collection of Presidential speeches
from the millercenter.org loading into ES. I'm able to execute a
termvector query which returns the below;

term
term_freq
ttf
doc_freq

Is termvector the appropriate query for a wordcloud? If so, which
numerical value should I use?

Thanks for your help.

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8012a91b-6202-4385-9dcb-0860be24a72e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Word Cloud from elasticsearch API Elasticsearch	2	5414	August 18, 2017
Simple Word Cloud from Elasticsearch Elastic Community and Ecosystem	3	4585	July 6, 2017
Terms aggregations - Getting a total of each word across all documents Elasticsearch	9	10175	July 5, 2017
Access Index reader to generate word cloud through plugin Elasticsearch	1	456	September 10, 2018
How to get term_vector for a document Elasticsearch	5	523	July 6, 2017

WordCloud in Elasticsearch

Related topics