Insights for an ES Newbie


(zupeanut) #1

Hi,

I would like to try and cluster my documents. From what I understand of Lucene, it means fully replacing the term-vector. From my reading, I have determined that this vector is hidden in ES. Is there a way to augment my results using some or all of the information I do have available?

Available Information:

  • Global term importance (ie. word and importance relative to corpus)
  • Document vectors (ie. word and importance relative to document)

Many thanks in advance!

Regards,
Andrew


(Shay Banon) #2

In general you can do it, but its quite low level Lucene (and elasticsearch) work. You can write your own analyzer that puts your own "importance" as a payload on terms the analyzer generates, and then use those payloads in a custom query you implement.

I have been thinking hard on trying to enable this simply in elasticsearch, but did not come up (yet) with a nice API centric solution.
On Tuesday, February 22, 2011 at 11:46 PM, zupeanut wrote:

Hi,

I would like to try and cluster my documents. From what I understand of
Lucene, it means fully replacing the term-vector. From my reading, I have
determined that this vector is hidden in ES. Is there a way to augment my
results using some or all of the information I do have available?

Available Information:

  • Global term importance (ie. word and importance relative to corpus)
  • Document vectors (ie. word and importance relative to document)

Many thanks in advance!

Regards,
Andrew

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Insights-for-an-ES-Newbie-tp2555803p2555803.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #3