I am currently using an Elasticsearch plugin called "termlist"
(https://github.com/jprante/elasticsearch-index-termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.
I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.
So I have a couple of questions:
I presume that Elasticsearch might already use hashes of terms
internally in the index, so would it be possible to get those?
If the above is not possible, what other options do I have to circumvent
the 30-40 MB barrier?
The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ..., and you can limit the number of
entries returned by say size=1000 (or 10000 etc) The 'term' filter should
be sufficient for most cases.
I am currently using an Elasticsearch plugin called "termlist" ( GitHub - jprante/elasticsearch-index-termlist: Elasticsearch Index Termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.
I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.
So I have a couple of questions:
I presume that Elasticsearch might already use hashes of terms
internally in the index, so would it be possible to get those?
If the above is not possible, what other options do I have to
circumvent the 30-40 MB barrier?
On Monday, December 15, 2014 12:32:00 AM UTC+2, Jörg Prante wrote:
The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ..., and you can limit the number of
entries returned by say size=1000 (or 10000 etc) The 'term' filter should
be sufficient for most cases.
There are no hashes for terms.
Jörg
On Fri, Dec 12, 2014 at 2:45 PM, Rosen Nikolov <rpni...@gmail.com
<javascript:>> wrote:
I am currently using an Elasticsearch plugin called "termlist" ( GitHub - jprante/elasticsearch-index-termlist: Elasticsearch Index Termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.
I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.
So I have a couple of questions:
I presume that Elasticsearch might already use hashes of terms
internally in the index, so would it be possible to get those?
If the above is not possible, what other options do I have to
circumvent the 30-40 MB barrier?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.