Adding a new field data type to field-data-cache service seems to be impossible? (0.90.5)


(Tikitu de Jager) #1

Hi folks,

I've inherited some 0.20 code which essentially adds a new field data type
(analysed strings stored only as hash-values for the low memory signature;
that's enough to do some statistical analysis on things like term
frequencies). (The code is the experimental part of Boaz Leskes's
elasticfacets library, http://github.com/bleskes/elasticfacets -- we're
probably the only folk using it in production, and we'd rather move to 0.90
now instead of waiting for the 1.0 release, which I presume he will support
when it arrives.)

I'm concerned that this may not be possible at all in 0.90. I'm looking at
index.fielddata.IndexFieldDataService and I don't see a way to get an entry
into the cache there unless it's one of the builtin types. (Once the value
is in, one can get it back out using getForField() with a custom subclass
of FieldDataType for the type argument, but as far as I can see
getForField() is also the only way to put values into the cache, and it
works with a fixed list of builders.)

Can anyone confirm that there is no way to get custom field data types into
the cache (in 0.90), or point me in the right direction if there is?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Boaz Leskes) #2

Hi Tikitu,

You are looking at the right place and there is no way to inject custom
fielddata types without writing your own facet which doesn't rely on the
standard IndexFieldDataService to load it's data.

There are some thoughts on introducing approximation methods for
statistical methods (like hyperloglog, which is different than what you
need) but these are all at a very early stage and will be post 1.0.

That said, v0.90 includes some huge memory savers, one of which is storing
strings in utf8 in memory. This should save ~50% of the memory signature
for those values. Another feature is the ability to to tweak which values
are loaded into the field data cache.
See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#field-data-filtering
. I would investigate whether these are enough of a saving for you to not
need to rely on the hashed strings field data cache offered by the
elasticfacet plugin.

Cheers,
Boaz

On Monday, October 21, 2013 12:36:23 PM UTC+2, Tikitu de Jager wrote:

Hi folks,

I've inherited some 0.20 code which essentially adds a new field data type
(analysed strings stored only as hash-values for the low memory signature;
that's enough to do some statistical analysis on things like term
frequencies). (The code is the experimental part of Boaz Leskes's
elasticfacets library, http://github.com/bleskes/elasticfacets -- we're
probably the only folk using it in production, and we'd rather move to 0.90
now instead of waiting for the 1.0 release, which I presume he will support
when it arrives.)

I'm concerned that this may not be possible at all in 0.90. I'm looking at
index.fielddata.IndexFieldDataService and I don't see a way to get an entry
into the cache there unless it's one of the builtin types. (Once the value
is in, one can get it back out using getForField() with a custom subclass
of FieldDataType for the type argument, but as far as I can see
getForField() is also the only way to put values into the cache, and it
works with a fixed list of builders.)

Can anyone confirm that there is no way to get custom field data types
into the cache (in 0.90), or point me in the right direction if there is?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Tikitu de Jager) #3

Thanks Boaz for the quick reply. It seems like we don't have much choice
(not wanting to reimplement the caching layer ourselves...).

It would be fantastic if a plugin were able to register a field data type
(with Builder) instead of restricting things to the private-final-Immutable
builtins. In our case, for instance, just using the .hashCode() of the
terms is enough for our use case: no matter what memory improvements come
with 0.90, replacing strings with ints has got to give (yet) more gain.

Thanks,
Tikitu

On Monday, 21 October 2013 14:23:20 UTC+3, Boaz Leskes wrote:

Hi Tikitu,

You are looking at the right place and there is no way to inject custom
fielddata types without writing your own facet which doesn't rely on the
standard IndexFieldDataService to load it's data.

There are some thoughts on introducing approximation methods for
statistical methods (like hyperloglog, which is different than what you
need) but these are all at a very early stage and will be post 1.0.

That said, v0.90 includes some huge memory savers, one of which is storing
strings in utf8 in memory. This should save ~50% of the memory signature
for those values. Another feature is the ability to to tweak which values
are loaded into the field data cache. See:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.html#field-data-filtering. I would investigate whether these are enough of a saving for you to not
need to rely on the hashed strings field data cache offered by the
elasticfacet plugin.

Cheers,
Boaz

On Monday, October 21, 2013 12:36:23 PM UTC+2, Tikitu de Jager wrote:

Hi folks,

I've inherited some 0.20 code which essentially adds a new field data
type (analysed strings stored only as hash-values for the low memory
signature; that's enough to do some statistical analysis on things like
term frequencies). (The code is the experimental part of Boaz Leskes's
elasticfacets library, http://github.com/bleskes/elasticfacets -- we're
probably the only folk using it in production, and we'd rather move to 0.90
now instead of waiting for the 1.0 release, which I presume he will support
when it arrives.)

I'm concerned that this may not be possible at all in 0.90. I'm looking
at index.fielddata.IndexFieldDataService and I don't see a way to get an
entry into the cache there unless it's one of the builtin types. (Once the
value is in, one can get it back out using getForField() with a custom
subclass of FieldDataType for the type argument, but as far as I can see
getForField() is also the only way to put values into the cache, and
it works with a fixed list of builders.)

Can anyone confirm that there is no way to get custom field data types
into the cache (in 0.90), or point me in the right direction if there is?

Cheers,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4