Noob question about object-string handling

Hi,

We're in the midst of adding more ES to our infrastructure, but we've
frequently run into object-string issues with the data we have in our
system.

I understand that when data comes in the first instance sets the index
precedence for how things are indexed:

A:
{
"something": {"is": "objectified"}
}

vs.

B:
{
"something": "is a string"
}

If I index B->A, both will stick with a mapping of string, whereas, A->B, B
will fail.

Our historical data we want to index is riddled with json docs that fit
more the A->B transition due to the way our input form data gets versioned
over time. We do have ways of partitioning these versions to mitigate this,
but we were wondering if there was an inbuilt way for ES to be tolerant of
the A->B scenario.

Are there tricks or workarounds to find a way to say, multi_field or
dynamic template to allow for some tolerance of our admittedly ugly, er,
schemaless data?

Thanks,

Dan

--

I would suggest renaming the field "something" into "something_o" if you
detect that it contains an object and "something_s" if you detect that it
contains a string before sending the document to elasticsearch. It will
make your life easier in the future when you will start running queries
against this field.

On Thursday, November 8, 2012 1:54:18 PM UTC-5, dmyung wrote:

Hi,

We're in the midst of adding more ES to our infrastructure, but we've
frequently run into object-string issues with the data we have in our
system.

I understand that when data comes in the first instance sets the index
precedence for how things are indexed:

A:
{
"something": {"is": "objectified"}
}

vs.

B:
{
"something": "is a string"
}

If I index B->A, both will stick with a mapping of string, whereas, A->B,
B will fail.

Our historical data we want to index is riddled with json docs that fit
more the A->B transition due to the way our input form data gets versioned
over time. We do have ways of partitioning these versions to mitigate this,
but we were wondering if there was an inbuilt way for ES to be tolerant of
the A->B scenario.

Are there tricks or workarounds to find a way to say, multi_field or
dynamic template to allow for some tolerance of our admittedly ugly, er,
schemaless data?

Thanks,

Dan

--