Terms / Documents Matrix

Jimmy_Krehl · March 8, 2012, 8:07pm

I've been looking for a way to extract n-gram frequencies from
ElasticSearch as though it were a large table of n-grams by documents. I
found this thread from about a year ago:

http://elasticsearch-users.115913.n3.nabble.com/Pseudo-map-reduce-for-searchresults-td2683300.html

"3. The above, 1 and 2, talk about having map reduce implemented on the
"search" aspect. One thing that I would love to also tackle is the "terms"
aspect of a search engine. Being able to run (streaming) map reduce jobs on
terms, especially ones with term vector information, can provide a strong
infrastructure for implementing algos like clustering and the like.

So, yes, it has crossed my mind :), and it is on the roadmap."

I'm wondering what the status of this is today. Is something similar
supported in a different way? I could begin work on a plugin or I could
help with a module in development.

Thanks,
Jim Krehl

kimchy · March 9, 2012, 6:58pm

Nothing has happened on that front, though I still toy with the idea

On Thursday, March 8, 2012 at 10:07 PM, Jimmy Krehl wrote:

I've been looking for a way to extract n-gram frequencies from Elasticsearch as though it were a large table of n-grams by documents. I found this thread from about a year ago:

http://elasticsearch-users.115913.n3.nabble.com/Pseudo-map-reduce-for-searchresults-td2683300.html

"3. The above, 1 and 2, talk about having map reduce implemented on the "search" aspect. One thing that I would love to also tackle is the "terms" aspect of a search engine. Being able to run (streaming) map reduce jobs on terms, especially ones with term vector information, can provide a strong infrastructure for implementing algos like clustering and the like.

So, yes, it has crossed my mind :), and it is on the roadmap."

I'm wondering what the status of this is today. Is something similar supported in a different way? I could begin work on a plugin or I could help with a module in development.

Thanks,
Jim Krehl

Jimmy_Krehl · March 9, 2012, 8:18pm

Is a search plugin the route to go? I'm pretty new to ES and I'm not if
there's a framework for those. I'm hoping to be able to leverage the
search infrastructure in ES to distribute the collation of n-grams.
Googling has lead to me to believe that people link ES's indices to HDFS
and use Mahout to extract TF/IDF data. I'd prefer using ES entirely,
however.

Thanks!
jimmyk

On Friday, March 9, 2012 10:58:53 AM UTC-8, kimchy wrote:

Nothing has happened on that front, though I still toy with the idea

On Thursday, March 8, 2012 at 10:07 PM, Jimmy Krehl wrote:

I've been looking for a way to extract n-gram frequencies from
Elasticsearch as though it were a large table of n-grams by documents. I
found this thread from about a year ago:

http://elasticsearch-users.115913.n3.nabble.com/Pseudo-map-reduce-for-searchresults-td2683300.html

"3. The above, 1 and 2, talk about having map reduce implemented on the
"search" aspect. One thing that I would love to also tackle is the "terms"
aspect of a search engine. Being able to run (streaming) map reduce jobs on
terms, especially ones with term vector information, can provide a strong
infrastructure for implementing algos like clustering and the like.

So, yes, it has crossed my mind :), and it is on the roadmap."

I'm wondering what the status of this is today. Is something similar
supported in a different way? I could begin work on a plugin or I could
help with a module in development.

Thanks,
Jim Krehl

kimchy · March 10, 2012, 6:45pm

It should be possible with a plugin, and it might not be that difficult if you have a very specific use case.

On Friday, March 9, 2012 at 10:18 PM, Jimmy Krehl wrote:

Is a search plugin the route to go? I'm pretty new to ES and I'm not if there's a framework for those. I'm hoping to be able to leverage the search infrastructure in ES to distribute the collation of n-grams. Googling has lead to me to believe that people link ES's indices to HDFS and use Mahout to extract TF/IDF data. I'd prefer using ES entirely, however.

Thanks!
jimmyk

On Friday, March 9, 2012 10:58:53 AM UTC-8, kimchy wrote:

Nothing has happened on that front, though I still toy with the idea

On Thursday, March 8, 2012 at 10:07 PM, Jimmy Krehl wrote:

I've been looking for a way to extract n-gram frequencies from Elasticsearch as though it were a large table of n-grams by documents. I found this thread from about a year ago:

http://elasticsearch-users.115913.n3.nabble.com/Pseudo-map-reduce-for-searchresults-td2683300.html

"3. The above, 1 and 2, talk about having map reduce implemented on the "search" aspect. One thing that I would love to also tackle is the "terms" aspect of a search engine. Being able to run (streaming) map reduce jobs on terms, especially ones with term vector information, can provide a strong infrastructure for implementing algos like clustering and the like.

So, yes, it has crossed my mind :), and it is on the roadmap."

I'm wondering what the status of this is today. Is something similar supported in a different way? I could begin work on a plugin or I could help with a module in development.

Thanks,
Jim Krehl

Topic		Replies	Views
Output n-gram frequency distributions in Elasticsearch? Elasticsearch	2	864	November 26, 2018
Multi-word Term Vectors with Word nGrams? Elasticsearch	3	766	July 6, 2017
How to get term frequencies? Elasticsearch	2	340	July 6, 2017
LDA/Topic modeling Elasticsearch	5	4805	November 4, 2022
Searching query return results based on word frequency/term vectors Elasticsearch	1	558	May 23, 2018

Terms / Documents Matrix

Related topics