Mapping data, tokenizer

Jaroslav_Zeman · January 10, 2021, 7:42am

Hi

I am a beginner in using elasticsearch, and I have a few questions on how best to store the data in elasticsearch with respect to the maximum search speed.

In the relational database, I have the following three entities (tables) stored in binding:
car brand 1: n car model 1: n car type

example:

Brand			Model              				Type
MERCEDES-BENZ	E-CLASS COUPE (C123)		280 CE
MERCEDES-BENZ	E-CLASS COUPE (C123)		300 C Turbo-D (123.153)
MERCEDES-BENZ	E-CLASS COUPE (C123)		280 C
MERCEDES-BENZ	E-CLASS KOMBI ESTATE (S123)	200 T (123.280)
MERCEDES-BENZ	E-CLASS KOMBI ESTATE (S123)	200 T
MERCEDES-BENZ	E-CLASS COUPE (C124)		230 CE (124.043)
MERCEDES-BENZ	E-CLASS COUPE (C124)		300 CE
MERCEDES-BENZ	E-CLASS COUPE (C124)		300 CE-24 (124.051)

And last entity (table) product is binding n:n to car type.

How best to store this data in elasticsearch? Main the task is to find the product over car.

I have the following variants:

four separate indexes (car brand, car model, car type, product + car type), but in this case I don't know how to link indexes and create a simple query over all indexes
single index product + with tree separated column, data in columns will be separated over comma
single index product + with single nested field over this structure:

      "car": [
        {
          "brand": "Mercedes",
          "model": [
            {
              "name": "E-CLASS COUPE (C123)",
              "cartype": [
                "280 CE",
                "300 C Turbo-D (123.153)"
              ]
            },
            {
              "name": "E-CLASS KOMBI ESTATE (S123)",
              "cartype": [
                "200 T (123.280)",
                "200 T"
              ]
            }
          ]
        },
        {
          "brand": "BMW" …. Next cars
        }
      ]
    }

but I also have no idea how to create a query through this structure.

The following question. How to set the tokenizer, for example, to find the product, if the input string is: “Mercedes coupe 300 turbo”?

Thank you for any answer

Jaroslav

Christian_Dahlqvist · January 10, 2021, 8:52am

Elasticsearch does not support joins so this approach is not possible.

The best way to work with Elasticsearch is often to completely flatten and denormalize the data model and avoid trying to replicate relational concepts using nested documents or parent-child relationships.

You could create car documents for all permutations looking something like this:

{
  "brand": "Mercedes",
  "model": {
    "name": "E-CLASS COUPE (C123)",
    "cartype":  "280 CE"
  }
}

You have not shown what information product contains, but you could either add this as an array for each car document (if not too large) or simply further denormalize and create one car document per product and add the product data to this.

Try to store the data so that the document structure aligns with what you will be searching for.

This approach increases storage somewhat, but maybe not as much as you initially would think, but this leads to dramatically simpler and faster queries. If data is updated it does mean multiple documents need to be updated, but this is generally quite fast as long as updates are not very frequent.

Once the document structure has been determined we can look at mappings to support the queries you need to run.

Jaroslav_Zeman · January 10, 2021, 11:06am

OK, I understood that I was denormalizing all the data

this example me simple mapping to products:

{
  "product": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      },
      "brand": {
        "type": "keyword"
      },
      "warranty": {
        "type": "keyword"
      },
      "cars" : {
        "type": "nested"
      } 
    }
  }
}

there is example two products:

{
  "id": 1,
  "name": "Wipper",
  "brand": "Bosch",
  "warranty": 2,
  "cars": [
    {
      "brand": "ALFA ROMEO",
      "model": [
        {
          "name": "147 (937_)",
          "type": [
            "1.6 16V T.SPARK",
            "1.6 16V",
            "1.9 JDT"
          ]
        }
      ]
    },
    {
      "brand": "AUDI",
      "model": [
        {
          "name": "TT (8J3)",
          "type": [
            "1.8 TFSI",
            "1.6 16V"
          ]
        }
      ]
    }
  ]
}
{
  "id": 2,
  "name": "Wipper",
  "brand": "Eltis",
  "warranty": 2,
  "cars": [
    {
      "brand": "Avia",
      "model": [
        {
          "name": "D-Line",
          "cartype": [
            "D100",
            "D120"
          ]
        }
      ]
    },
    {
      "brand": "DAF",
      "model": [
        {
          "name": "95",
          "cartype": [
            "FAC 95.400",
            "VAC 95.350"
          ]
        }
      ]
    }
  ]
}

Notice: one product can have up to 1000 vehicles, 100 models and 10 brands (one brands can have more models). I can create a structure for car in another way.

Example:

{
  "id": 2,
  "name": "Wipper",
  "brand": "Eltis",
  "warranty": 2,
  "cars": [
    {"brand": "Avia", "Model" : "D-Line", "type" : "D100"}
    ...
    ...
  ]
}

But I can't judge which is better. I have 500 thousand vehicles associated with 10 thousand products.

Jaroslav

system · February 7, 2021, 11:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How Design my business to elastic indexes Elasticsearch	2	445	October 8, 2018
How to architect/design 3 relational table into elastic serach Elasticsearch	4	598	February 22, 2018
Efficient document structure Elasticsearch	4	530	August 9, 2020
How to organize related types Elasticsearch	3	324	July 6, 2017
How to fetch data from multiple index using join like sql Elasticsearch	18	88716	December 4, 2017

Mapping data, tokenizer

Related topics