Field with different types on same index

prabello · September 23, 2022, 6:33pm

Currently, we are indexing information on elastic for each business that is our customer (B2B), this is causing us to have too many indexes, some with very small shards (500mb shards)

But due to the nature of the data, we can have a lot of type collision on the fields, for example
Company_A, has the following:

{
    "id": "8f22efb2-6a2a-4cb7-9d0b-01ca0d6cff2e",
    "notes" : {
        "created_at": "2022/01/01",
        "value": "This is some random note"
    } ,
    "first_name": "Some name",
    "integration": "CRM_A"
}

While Company_B will have:

{
    "id": 1,
    "notes":  "This is a random note text",
    "name": "Some random name",
    "integration": "CRM_A"
}

What we were thinking is to have them on a single index, by "integration", but on this case, they would both have different types for the field called notes, where one has a object, another one has text, this also happens on a lot of other fields, basically multiple cases of the same field name with different types

Is there any way to solve this with elastic?
If we can have a way, we would finally be able to have our shards with 20-30gb of data, as is recommended, instead of having TONS of small shards, forcing us to have way more memory to keep all the mappings and everything in place.

One thing we considered was to add the type to the name of the fields, and remove them prior to returning the response, but that might takes really confusing

stephenb · September 23, 2022, 9:54pm

Short answer is those two schemas cannot live in a single index the way that you have them.

As you know in one schema notes is an object and another schema notes is just a concrete field.

The way to solve this is choose one of the schemas to standardize to and use an ingest pipeline to standardize the fields.

In this case, it's probably easier to just standardize the concrete into the notes.value field.

prabello · September 23, 2022, 10:41pm

My biggest issue is that happens with lots of different fields (some of the documents have more than 30 collisions :/)

stephenb · September 23, 2022, 10:45pm

Hmmmm that is unfortunate... no real simple answer there...

Unless you put each set under a top level Field... then you wont have collisions...

stephenb · September 23, 2022, 11:21pm

Just in case you are interest here is a sample ingest pipeline / simulate with various combinations

## Ingest Pipeline Concrete / Object
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "rename": {
          "if": "ctx?.notes instanceof String", 
          "field": "notes",
          "target_field": "notes.value",
          "ignore_failure": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "id": "8f22efb2-6a2a-4cb7-9d0b-01ca0d6cff2e",
        "notes": {
          "created_at": "2022/01/01",
          "value": "This is some random note"
        },
        "first_name": "Some name",
        "integration": "CRM_A"
      }
    },
    {
      "_source": {
        "id": 1,
        "notes": "This is a random note text",
        "name": "Some random name",
        "integration": "CRM_A"
      }
    },
    {
      "_source": {
        "id": 1,
        "name": "Some random name",
        "integration": "CRM_A"
      }
    }
  ]
}

Results

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "integration": "CRM_A",
          "notes": {
            "created_at": "2022/01/01",
            "value": "This is some random note"
          },
          "id": "8f22efb2-6a2a-4cb7-9d0b-01ca0d6cff2e",
          "first_name": "Some name"
        },
        "_ingest": {
          "timestamp": "2022-09-23T23:23:13.461958276Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "name": "Some random name",
          "integration": "CRM_A",
          "id": 1,
          "notes": {
            "value": "This is a random note text"
          }
        },
        "_ingest": {
          "timestamp": "2022-09-23T23:23:13.461985402Z"
        }
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_version": "-3",
        "_source": {
          "name": "Some random name",
          "integration": "CRM_A",
          "id": 1
        },
        "_ingest": {
          "timestamp": "2022-09-23T23:23:13.461991898Z"
        }
      }
    }
  ]
}

prabello · September 23, 2022, 11:41pm

Wow, thanks a lot stephenb! Gonna take a stab at it

system · October 21, 2022, 11:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Same field name with different data types in the same index Elasticsearch	6	12498	July 5, 2017
Mappings of different types in one index Elasticsearch	5	346	July 17, 2020
How to add two _type under same index. But both are having different Json data Elasticsearch	10	971	September 3, 2018
Alternative for mapping type Elasticsearch	6	414	September 29, 2020
Multi Core types mapping with a single field Elasticsearch	2	299	July 6, 2017

Field with different types on same index

Related topics