Noob question about object-string handling

Daniel_Myung · November 8, 2012, 6:54pm

Hi,

We're in the midst of adding more ES to our infrastructure, but we've
frequently run into object-string issues with the data we have in our
system.

I understand that when data comes in the first instance sets the index
precedence for how things are indexed:

A:
{
"something": {"is": "objectified"}
}

vs.

B:
{
"something": "is a string"
}

If I index B->A, both will stick with a mapping of string, whereas, A->B, B
will fail.

Our historical data we want to index is riddled with json docs that fit
more the A->B transition due to the way our input form data gets versioned
over time. We do have ways of partitioning these versions to mitigate this,
but we were wondering if there was an inbuilt way for ES to be tolerant of
the A->B scenario.

Are there tricks or workarounds to find a way to say, multi_field or
dynamic template to allow for some tolerance of our admittedly ugly, er,
schemaless data?

Thanks,

Dan

--

Igor_Motov · November 9, 2012, 8:31pm

I would suggest renaming the field "something" into "something_o" if you
detect that it contains an object and "something_s" if you detect that it
contains a string before sending the document to elasticsearch. It will
make your life easier in the future when you will start running queries
against this field.

On Thursday, November 8, 2012 1:54:18 PM UTC-5, dmyung wrote:

Hi,

We're in the midst of adding more ES to our infrastructure, but we've
frequently run into object-string issues with the data we have in our
system.

I understand that when data comes in the first instance sets the index
precedence for how things are indexed:

A:
{
"something": {"is": "objectified"}
}

vs.

B:
{
"something": "is a string"
}

If I index B->A, both will stick with a mapping of string, whereas, A->B,
B will fail.

Our historical data we want to index is riddled with json docs that fit
more the A->B transition due to the way our input form data gets versioned
over time. We do have ways of partitioning these versions to mitigate this,
but we were wondering if there was an inbuilt way for ES to be tolerant of
the A->B scenario.

Are there tricks or workarounds to find a way to say, multi_field or
dynamic template to allow for some tolerance of our admittedly ugly, er,
schemaless data?

Thanks,

Dan

--

Topic		Replies	Views
Indexing a blob Elasticsearch	6	1677	May 2, 2019
Even more "dynamic" mapping? Elasticsearch	1	484	August 8, 2017
Dynamic template - treat objects as strings? Elasticsearch	1	570	July 6, 2017
Field with various types Elasticsearch	1	292	July 6, 2017
Index Template and object type does not seem to work Elasticsearch	2	338	July 6, 2017

Noob question about object-string handling

Related topics