I would like to try and cluster my documents. From what I understand of Lucene, it means fully replacing the term-vector. From my reading, I have determined that this vector is hidden in ES. Is there a way to augment my results using some or all of the information I do have available?
Available Information:
Global term importance (ie. word and importance relative to corpus)
Document vectors (ie. word and importance relative to document)
In general you can do it, but its quite low level Lucene (and elasticsearch) work. You can write your own analyzer that puts your own "importance" as a payload on terms the analyzer generates, and then use those payloads in a custom query you implement.
I have been thinking hard on trying to enable this simply in elasticsearch, but did not come up (yet) with a nice API centric solution.
On Tuesday, February 22, 2011 at 11:46 PM, zupeanut wrote:
Hi,
I would like to try and cluster my documents. From what I understand of
Lucene, it means fully replacing the term-vector. From my reading, I have
determined that this vector is hidden in ES. Is there a way to augment my
results using some or all of the information I do have available?
Available Information:
Global term importance (ie. word and importance relative to corpus)
Document vectors (ie. word and importance relative to document)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.