Dynamic mapping to scalars and objects

Hey,

I'm trying to store the results of a user defined function in
elasticsearch. Because of this, sometimes the data isn't consistent and I
might try to store both an object and a scalar with the same key. For
example:

curl 'http://localhost:9200/text_index/test_type' -X POST --data '{"result":1}'

curl 'http://localhost:9200/text_index/test_type' -X POST --data '{"result":{"arbitrary_user_object_key":1}}'

I've tried using dynamic templates, to allow for a multi_field mapping, but if I don't add an object type to the mapping I get a MapperParsingException as above. If I do add objects to the mapping I get:

MapperParsingException[failed to parse]; nested: StackOverflowError;

As recreatable with this gist: https://gist.github.com/maxlang/6274253

So, is there a good way to store arbitrary key value data like this without knowing what the keys will be beforehand? Or do I need to either manually update the mapping before storing the results or somehow normalize the function output.

-Max

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Max,
the multi_field type is meant to do different things with the same value in
a document rather than having to submit that same value multiple times.
Those different things you can do with it will always kick in at the same
time. The most common usage is for instance a string field that needs to be
analyzed in different ways, for example non tokenized to support faceting
or sorting on it, but tokenized as well in order to run full-text queries
against it. That means the lucene index will have two different fields, and
elasticsearch knows they are two different variations of the same field.

The way you use the multi_field in your dynamic template looks more like
you would like to declare different types for the same field, or even have
only one of those options to kick in depending on what you have in the
document. Unfortunately that is not going to work. The problem here is that
elasticsearch wants to make sense your data and do what's best depending on
the data type, which is hard when it's not known what you are going to
index. After you send the first document with a field, elasticsearch
automatically updates the mapping (if not present for that field) assigning
a data type to it, driven by the json data type in your document. All the
other documents that you are going to index afterwards will need to contain
pretty much the same type for that field, otherwise you'll get back an
error.

My suggestion would be to use different fields, one per type, and handle
the complexity of having different types with same name in the application
layer.

On Tuesday, August 20, 2013 12:19:44 AM UTC+2, Max Lang wrote:

Hey,

I'm trying to store the results of a user defined function in
elasticsearch. Because of this, sometimes the data isn't consistent and I
might try to store both an object and a scalar with the same key. For
example:

curl 'http://localhost:9200/text_index/test_type' -X POST --data '{"result":1}'

curl 'http://localhost:9200/text_index/test_type' -X POST --data '{"result":{"arbitrary_user_object_key":1}}'

I've tried using dynamic templates, to allow for a multi_field mapping, but if I don't add an object type to the mapping I get a MapperParsingException as above. If I do add objects to the mapping I get:

MapperParsingException[failed to parse]; nested: StackOverflowError;

As recreatable with this gist: overflow exception · GitHub

So, is there a good way to store arbitrary key value data like this without knowing what the keys will be beforehand? Or do I need to either manually update the mapping before storing the results or somehow normalize the function output.

-Max

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Luca,

I guess I am trying to make the mapping a little too flexible. I suppose
I'll do something in the application layer like flattening the field names
or enforcing certain constraints on the returned object.

I'm using the multi_field stuff at least because ES can be a bit fragile
when you base the mapping off the first object you see. For example if you
see a long first but the field is actually a double. By using multi_fields
regardless of the first seen type, I can attempt numerical facets when
any numerical values exist, not just if the first recorded value is a
number. A pretty common thing I've had to deal with is values that are
usually a number but are the string "NA" when no number is present.
Multi_fields allow me to capture both the original value (as a string) for
display/terms facets, as well as the numerical value for
statistical/histogram facets, regardless of what the first value is. I was
trying to extend the same principle for objects, but again, based on what
you mentioned, I think it's just a bit much.

-Max

On Tue, Aug 20, 2013 at 1:36 AM, Luca Cavanna cavannaluca@gmail.com wrote:

Hi Max,
the multi_field type is meant to do different things with the same value
in a document rather than having to submit that same value multiple times.
Those different things you can do with it will always kick in at the same
time. The most common usage is for instance a string field that needs to be
analyzed in different ways, for example non tokenized to support faceting
or sorting on it, but tokenized as well in order to run full-text queries
against it. That means the lucene index will have two different fields, and
elasticsearch knows they are two different variations of the same field.

The way you use the multi_field in your dynamic template looks more like
you would like to declare different types for the same field, or even have
only one of those options to kick in depending on what you have in the
document. Unfortunately that is not going to work. The problem here is that
elasticsearch wants to make sense your data and do what's best depending on
the data type, which is hard when it's not known what you are going to
index. After you send the first document with a field, elasticsearch
automatically updates the mapping (if not present for that field) assigning
a data type to it, driven by the json data type in your document. All the
other documents that you are going to index afterwards will need to contain
pretty much the same type for that field, otherwise you'll get back an
error.

My suggestion would be to use different fields, one per type, and handle
the complexity of having different types with same name in the application
layer.

On Tuesday, August 20, 2013 12:19:44 AM UTC+2, Max Lang wrote:

Hey,

I'm trying to store the results of a user defined function in
elasticsearch. Because of this, sometimes the data isn't consistent and I
might try to store both an object and a scalar with the same key. For
example:

curl 'http://localhost:9200/text_**index/test_type http://localhost:9200/text_index/test_type' -X POST --data '{"result":1}'

curl 'http://localhost:9200/text_**index/test_type http://localhost:9200/text_index/test_type' -X POST --data '{"result":{"arbitrary_user_**object_key":1}}'

I've tried using dynamic templates, to allow for a multi_field mapping, but if I don't add an object type to the mapping I get a MapperParsingException as above. If I do add objects to the mapping I get:

MapperParsingException[failed to parse]; nested: StackOverflowError;

As recreatable with this gist: https://gist.github.com/**maxlang/6274253 https://gist.github.com/maxlang/6274253

So, is there a good way to store arbitrary key value data like this without knowing what the keys will be beforehand? Or do I need to either manually update the mapping before storing the results or somehow normalize the function output.

-Max

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_l-W4Fs8ORc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.