ES6.0: multiple types, indexes, relationships puzzle

The removal of mapping types is giving me some indexing woes.

In my system I've got Places, Events, News and Vouchers. Events, News and Vouchers are always related to a specific Place.

In a pre-6.0 world, I had all 3 as types in a single index.

In a 6.0+ world, as far as I can tell I have 2 options:

  1. Create 4 separate indexes, 1 for each type.
  2. Stuff them all in a single index and type.

The challenge is, of course, in the relationships. As an example, say I want to query for News in a date range related to Places in a specific area. In the 1st option I don't see how I can do that in a single query. Using the second option I guess I'm supposed to use the new join.

The second option, however assumes that I can somehow make all 4 types fit into a single mapping, and even if I can, isn't the point of not having more than 1 type per index exactly to avoid the drawbacks from such a setup -- in other words, it sounds like exactly The Wrong Approach (tm). It's not that I'm afraid to twist the system a bit to achieve my goals, but I'm afraid we might well bump into unexpected performance or other issues down the line.

What would be The Right Approach (tm) to use ElasticSearch 6.0 in this use case?

Thanks,
Erwin

The challenge is, of course, in the relationships.

I agree. This is a challenge with a tool which is not a relational database.

To find the "best" way IMHO to do what you want to do in a efficient way (at search time), you need to ask yourself some questions:

  • What my users are going to search for? Places? Events? Vouchers? News?
  • What properties are needed for users to search such objects?

Let say you want to be able to search for Events and News.
And that you want to be able to search for those per country or per shop name or per location of the place. Then index documents like this:

PUT event/doc/1
{
  "text": "This is my event 1",
  "place": {
    "country": "France",
    "shop": "My lovely shop",
    "location": {
      "lat": 49.0422777,
      "lon": 2.0290053
    } 
  }
}
PUT news/doc/1
{
  "text": "This is my fantastic news",
  "place": {
    "country": "France",
    "shop": "My lovely shop",
    "location": {
      "lat": 49.0422777,
      "lon": 2.0290053
    } 
  }
}

Would such a model work for you?

Hi David,

Thanks for your suggestion — it would work, though changes to Places would lead to a large amount of reindexing, and in general it involves a tad more housekeeping. All this said, this certainly seems like a viable way forward. Thanks!

Cheers,
Erwin

I have basically asked the same question, but no one has given any answer yet.

My situation is a lot more complex. ES did not provide any easy way to convert multiple type with parent child relationship in 6.0.

ES should have keep the existing way of handling parent child relationship and hide the detail internal stuff from the user.

Even if they provide the conversion method, I have to redo most of my project. No enterprise software would dare to do this kind of drastic change that requires the customer to redo almost all the work.

2 Likes

Don't know what your project is, iti, but I a right there with you. But it might be worse than that. I just saw a video presentation on Youtube by Martijn van Groningen - https://youtu.be/YCkkOyZ-zkM where he said if you add a new document to an existing block of nested documents, you must reindex the whole block. Since I get new data added to my docs every week - and in some cases every day - I'd be reindexing more than indexing!!!!

David, mon ami! Je parle le francais aussi, mai tres mal! :sweat_smile:
I just saw your talk with the guy from Couchbase on Youtube, btw, which proves it really is a small world.

Pleasantries aside, I am a little confused by your answer here. Your location data are floats, or doubles, or geospatial - whatever you call them - types, whereas the rest is a text string. How can you do that in 6.0 if there is only one type per index? Isn't that the whole point of this breaking change in 6.0?

Nested docs have always been that way.

Two different types of types (pardon the pun): that’s one mapping type per index, not one field type, Malik.

Since the project I work for has premium support. I am asking Elastic Co to answer this question.Hopefully it can be resolved.

Just got the official words from ES support: They can not do it. They have no easy way of converting 1 index with multiple types with nested doc to 1 index 1 type.

If anyone is interested, please take a look at the file I sent to ES support and asked them to convert it to 6.0 1 index 1 type.

It is a full script with mapping/data/query. It has 1 index with 1 parent type and 2 child types. I just want to achieve the same result in 6.0 as in 5.6(Keep in mind my project has 71 child types not just 2.)

Thanks in advance.

This was quite the puzzle :sweat_smile:, but here's a solution. To use parent/child in version 6, you will need to map all your document types as a single type. Within that type you add a field of type join which defines all the parent child relationships.

In the case of @iti you could define a single document type doc with a field you could call join (or whatever you want to call it) of type join which you would map like this, defining all the different parent/child relationships:

"join": {
  "type": "join",
  "relations": {
    "csar": ["es_000_reg","es_070_ori"]
  }
}

Apart from this field, you also map all the existing fields that you had in your original parent type as well as all your children. Let's say you create a new index cic4essupport_new. The mapping would be this:

PUT cic4essupport_new
{
  "settings": {
    "index.mapping.single_type": true
  },
  "mappings": {
    "doc": {
      "properties": {
        "join": {
          "type": "join",
          "relations": {
            "csar": [
              "es_000_reg",
              "es_070_ori"
            ]
          }
        },
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "fcn": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "first_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "last_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "active_flag": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "address_type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "gender": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "geo_code": {
          "type": "geo_point"
        },
        "hair_color": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "race": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "weight": {
          "type": "long"
        },
        "registrant": {
          "type": "nested",
          "properties": {
            "flag1": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "reg_status": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "reg_type": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "ori": {
          "type": "nested",
          "properties": {
            "agency_name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "ori_owner": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      }
    }
  }
}

In that new index, you would index parents like this:

PUT cic4essupport_new/doc/csar_1234567890210
{
  "join": {
    "name": "csar"
  },
  "fcn": "1234567890210",
  "first_name": "FNAME1",
  "last_name": "LNAME1",
  "gender": "MALE",
  "geo_code": "38.38755,-122.212282",
  "race": "BLACK",
  "address_type": "RESIDENCE",
  "height": 509,
  "weight": 165,
  "eye_color": "BROWN",
  "hair_color": "BLACK",
  "@version": "1",
  "@timestamp": "2017-11-29T00:51:28.536Z"
}

And children like this:

PUT cic4essupport_new/doc/es_070_ori_1234567890210?routing=csar_1234567890210
{
  "join": {
    "name": "es_070_ori",
    "parent": "csar_1234567890210"
  },
  "fcn": "1234567890210",
  "@version": "1",
  "ori": [
    {
      "ori_owner": "CA0040001",
      "agency_name": "ED SO"
    },
    {
      "ori_owner": "CA0550020",
      "agency_name": "YY SO"
    },
    {
      "ori_owner": "CA0445300",
      "agency_name": "SCC PD"
    }
  ],
  "@timestamp": "2017-11-29T00:52:18.850Z"
}

Note that because parents and children are of the same document type now, you cannot give them the same _id. What I've done here is prepend the original _type name to the _id.

Your queries would stay the same, except that you remove the type from the URI. So, you'd hit GET /cic4essupport_new/_search instead.

1 Like

Instead of doing that manually, you can use the _reindex API to reindex all existing documents from cic4essupport into cic4essupport_new using a script to modify the documents according to the new structure:

POST _reindex
{
  "source": {
    "index": "cic4essupport"
  },
  "dest": {
    "index": "cic4essupport_new"
  },
  "script": {
    "source": """
    if (ctx._parent == null)
    {
      ctx._source.join = new HashMap();
      ctx._source.join.name = 'csar';
      ctx._type = 'doc';  
      
      ctx._id = "csar_" + ctx._id;
    }
    else
    {
      ctx._source.join = new HashMap();
      ctx._source.join.name = ctx._type;
      ctx._source.join.parent = "csar_" + ctx._parent;
            
      ctx._parent = null;
      ctx._routing = "csar_" + ctx._id;
      
      ctx._id = ctx._type + "_" + ctx._id;
      ctx._type = 'doc';
    }
    """,
    "lang": "painless"
  }
}

You could do this on version 5.6 (which supports both the old and new way of doing parent/child) before you migrate to 6.0. You should also be able to do it using remote reindexing to get the old docs from your existing 5 cluster to a new 6 cluster.

@abdon Thanks for the answer.

Unfortunately, this is not the answer I am looking for. The reasons:

  1. How do you check a child's mapping now? Used to be able to check individual child type mapping , and your way, I am getting whole index mapping. Or is there a way to do that?

  2. You have not shown the query results. Your way, the result is significantly differently from the multiple type one. That means I have to rewrite the whole project.

I knew there are 2 ways (multiple indexes with 1 type and your example) to covert parent/child types as mentioned in my other(see earlier one) post.
The closer one should be each type convert to 1 index 1 type. That way, I can still query for each child's mapping. But I have no idea how to create relationship between indexes.

Worst comes to worst, I just to have to covert to something like you have suggested and rewrite the whole project.

Thanks again.

There is no way to do joins between two separate indexes in Elasticsearch. One type per index and then doing joins between those indexes is not possible and I don't see it happening in the foreseeable future. That's just not how Elasticsearch works.

The only way that your use case is going to work is by normalizing to a single document type using the join field as I have described in my earlier post. To answer your first question: yes, unfortunately it's not easy to check individual child mappings like this. That's up to your application now.

Answering question 2: I'm surprised that you find the query results to be significantly different. In fact, I think these are exactly the same, except for the ordering of the fields, which is never guaranteed in JSON.

I'm pasting the results below. Please let me know which differences you see, and I'll try to help you resolve those.

Query 1 response:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 2,
    "hits": [
      {
        "_index": "cic4essupport_new",
        "_type": "doc",
        "_id": "csar_1234567890210",
        "_score": 2,
        "_source": {
          "gender": "MALE",
          "race": "BLACK",
          "address_type": "RESIDENCE",
          "last_name": "LNAME1",
          "weight": 165,
          "hair_color": "BLACK",
          "fcn": "1234567890210",
          "eye_color": "BROWN",
          "@timestamp": "2017-11-29T00:51:28.536Z",
          "@version": "1",
          "geo_code": "38.38755,-122.212282",
          "join": {
            "name": "csar"
          },
          "first_name": "FNAME1",
          "height": 509
        },
        "inner_hits": {
          "es_070_ori": {
            "hits": {
              "total": 1,
              "max_score": 0.13353139,
              "hits": [
                {
                  "_type": "doc",
                  "_id": "es_070_ori_1234567890210",
                  "_score": 0.13353139,
                  "_routing": "csar_1234567890210",
                  "_source": {
                    "fcn": "1234567890210",
                    "@timestamp": "2017-11-29T00:52:18.850Z",
                    "ori": [
                      {
                        "agency_name": "ED SO",
                        "ori_owner": "CA0040001"
                      },
                      {
                        "agency_name": "YY SO",
                        "ori_owner": "CA0550020"
                      },
                      {
                        "agency_name": "SCC PD",
                        "ori_owner": "CA0445300"
                      }
                    ],
                    "@version": "1",
                    "join": {
                      "parent": "csar_1234567890210",
                      "name": "es_070_ori"
                    }
                  }
                }
              ]
            }
          },
          "es_000_reg": {
            "hits": {
              "total": 1,
              "max_score": 0.13353139,
              "hits": [
                {
                  "_type": "doc",
                  "_id": "es_000_reg_1234567890210",
                  "_score": 0.13353139,
                  "_routing": "csar_1234567890210",
                  "_source": {
                    "fcn": "1234567890210",
                    "@timestamp": "2017-11-29T01:09:36.244Z",
                    "@version": "1",
                    "registrant": [
                      {
                        "reg_status": "Z_OTHERS",
                        "flag1": "N",
                        "reg_type": "TERM"
                      }
                    ],
                    "join": {
                      "parent": "csar_1234567890210",
                      "name": "es_000_reg"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Query 2 response:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "ori_child": {
      "doc_count": 2,
      "child_nested": {
        "doc_count": 5,
        "ori_ori_owner.keyword": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "CA0040001 - ED SO",
              "doc_count": 1
            },
            {
              "key": "CA0152300 - PPD",
              "doc_count": 1
            },
            {
              "key": "CA0330R01 - PCC SO",
              "doc_count": 1
            },
            {
              "key": "CA0445300 - SCC PD",
              "doc_count": 1
            },
            {
              "key": "CA0550020 - YY SO",
              "doc_count": 1
            }
          ]
        }
      }
    },
    "registrant_child": {
      "doc_count": 2,
      "child_nested": {
        "doc_count": 2,
        "registrant_flag1.keyword": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "N",
              "doc_count": 1
            },
            {
              "key": "Y",
              "doc_count": 1
            }
          ]
        },
        "registrant_reg_type.keyword": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "SSS",
              "doc_count": 1
            },
            {
              "key": "TERM",
              "doc_count": 1
            }
          ]
        },
        "registrant_reg_status.keyword": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "OOS",
              "doc_count": 1
            },
            {
              "key": "Z_OTHERS",
              "doc_count": 1
            }
          ]
        }
      }
    }
  }
}
1 Like

@abdon Thanks

My bad. The 1 Index 1 type results actually similar to 1 Index Multiple Type. I was surprised that my original Query works for 1 Index 1 type without much modification (Just replace index and type on GET, that is all).
"has child" query still refer to TYPE, but this time the type is the relation defined in the JOIN.

For the mapping part. I can live with it, it is just harder to maintain, but mapping does not change often.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.