Creating a custom plugin to return hashes of the terms or the terms of an Elasticsearch index

I am currently using an Elasticsearch plugin called "termlist"
(https://github.com/jprante/elasticsearch-index-termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.

I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.

So I have a couple of questions:

  1. I presume that Elasticsearch might already use hashes of terms
    internally in the index, so would it be possible to get those?

  2. If the above is not possible, what other options do I have to circumvent
    the 30-40 MB barrier?

    Thank you in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ..., and you can limit the number of
entries returned by say size=1000 (or 10000 etc) The 'term' filter should
be sufficient for most cases.

There are no hashes for terms.

Jörg

On Fri, Dec 12, 2014 at 2:45 PM, Rosen Nikolov rpnikolov@gmail.com wrote:

I am currently using an Elasticsearch plugin called "termlist" (
GitHub - jprante/elasticsearch-index-termlist: Elasticsearch Index Termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.

I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.

So I have a couple of questions:

  1. I presume that Elasticsearch might already use hashes of terms
    internally in the index, so would it be possible to get those?

  2. If the above is not possible, what other options do I have to
    circumvent the 30-40 MB barrier?

    Thank you in advance.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF6qYpkhOvxrPRTgYU%3D8def18h%2BjuA8jzgyzurGV9300Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you Jörg,

Very helpful.

On Monday, December 15, 2014 12:32:00 AM UTC+2, Jörg Prante wrote:

The termlist plugin can use filters with the 'term' parameter and
pagination with the 'size' parameter. So you can get smaller term lists,
for terms starting with 'a','b','c' ..., and you can limit the number of
entries returned by say size=1000 (or 10000 etc) The 'term' filter should
be sufficient for most cases.

There are no hashes for terms.

Jörg

On Fri, Dec 12, 2014 at 2:45 PM, Rosen Nikolov <rpni...@gmail.com
<javascript:>> wrote:

I am currently using an Elasticsearch plugin called "termlist" (
GitHub - jprante/elasticsearch-index-termlist: Elasticsearch Index Termlist)
that returns the terms of an index. But it breaks when there are too many
terms and the output
information is larger than about 30-40 megabytes. I need my custom plugin
to work for bigger amounts of output data.

I am thinking about creating a custom plugin to return hashes of terms
instead of the actual terms to reduce the output data volume.

So I have a couple of questions:

  1. I presume that Elasticsearch might already use hashes of terms
    internally in the index, so would it be possible to get those?

  2. If the above is not possible, what other options do I have to
    circumvent the 30-40 MB barrier?

    Thank you in advance.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f9f8e5c0-7f01-4769-a584-223586cec3be%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df55b0f2-305b-4546-8d4d-8fd14e859e4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.