Abolishing types - a disaster from my point of view

So, I've just read https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch and related articles. While this may seem like a logical change to you from the inside, from my app's point of view, it looks like a complete rewrite. I'm just astonished and daunted by the work that would be required, with apparently no additional value to my app.

I have what is essentially a conventional database application which uses Elasticsearch as the database in order to take advantage of the excellent search facilities. I have many diverse types: it's probably the antithesis of the cases you are thinking about. I have some types which have a lot in common, for example four different kinds of person which share many fields (called new, current, old, contact) and communications (pendingemails, sentemails, letters). Then I have many others which have nothing in common.

Generally I don't use ES _id's to link documents together: most of the time I use certain fields (like 'membershipnumber' which are my own identifiers, and sometimes a combination of fields. I rarely look up individual records by _id, more usually by a search which can have only one result.

If I understand your articles correctly, I should redistribute the more distinct types among separate indexes, and merge the ones with more commonality. The former case is a big code change but largely mechanical I guess. But testing it would be a huge burden. It would be helpful to have a way to group indexes in this case. I store several database instances as separate indexes which I'd want to keep more distinct than the new indexes which make up a single database. I don't believe I can have the same server behave as a separate cluster, can I? So I'm left with using, say, name prefixes for indexes to distinguish separate "databases" as it stands.

With the common ones, I'd still have to be able to tell the "types" apart (with my own "type" field presumably) and search selectively on one or more of them, which means most searches would have to become boolean searches on top of the existing search, whereas currently I can just put a comma-separated list of which types I want, and most often just the one type. Furthermore, currently using strict mapping, I can ensure the distinct fields from say "old" can't be accidentally present in say "new". If I can't distinguish them any more, this valuable feature you introduced in v5 is lost to me (unless I use separate indexes for these too, but that makes cross-type searching even harder).

Allowing existing indexes to continue for one more version doesn't really help much as reloading to re-index or move tot he new version then becomes impossible. Or even moving things between one server and another. Or installing a new instance of the app.

The scale of change involved here in this grossly non-upward-compatibility is so large that it will drive me to consider whether I should replace ES altogether with a more stable database back end. It would probably be no harder to put in an entirely different database backend that uses JSON objects, as that's essentially what you're intending. If you're going to do this to me now, why would I not fear that I'd have the same upheaval another version down the line, and again after that. I can devote my resources to chasing ES changes, or to improving my app. Which would you choose?

If it had been this way from the start, I'd have coded it that way in the first place, and all would be well. But pulling the rug from under my feet in this way with a mature app is just awful.

The thing is, despite your protestations about sparseness and how fields are stored and indexed, it works really well for me. Maybe if I was storing tens of millions of records I'd start to suffer, but I'm only dealing with tens or hundreds of thousands.

One thing that I have long thought would be a help is a class hierarchy for types, and if objects that share a lot in common are to be made more common, and others less so, this would seem even more appropriate. Currently my mappings have the same fields repeated multiple times in similar types. It would be helpful (though probably not at the expense of a rupture like you're intending) if I could list the common fields and then as I get more specific, inherit the common ones and add the new type-specific fields, thereby explicitly recognising the commonality.

I am just flabbergasted. Or I have completely misunderstood what you're intending!

I completely agree. This is a change that totally disregards use cases where people DO have distinct types for very good reasons. ElasticSearch DOES NOT EQUAL Logstash. Totally unacceptable from my point of view. I store master data as well as event data in ElasticSearch. Don't tell me to create an index for each different type I use. I'm already managing a lot of event indexes (split by year or month depending on the number of events coming in), don't make me manage master data indexes also. Even for events I have a number of different types: Number events, String events, DateTime events GeoPoint events,...

This really must be reconsidered or a lot of people will be very disappointed!

Best regards,
Frank Montyne

I think you can keep somehow the same behavior that you had before with only one type.

Let's say you have this:

PUT index/type1/1
{
  "common_field": "whatever",
  "foo": "whatever"
}
PUT index/type2/1
{
  "common_field": "whatever",
  "bar": "whatever"
}
PUT index/type3/1
{
  "common_field": "whatever",
  "baz": "whatever"
}

You can probably change the model to something like (this can also apply to @frank-montyne use case I think) :

PUT index/11
{
  "common_field": "whatever",
  "type1": {
    "foo": "whatever"
  }
}
PUT index/21
{
  "common_field": "whatever",
  "type2": {
    "bar": "whatever"
  }
}
PUT index/31
{
  "common_field": "whatever",
  "type3": {
    "baz": "whatever"
  }
}

Then everything will be in the same index as it was previously.
Of course the best approach should be:

PUT index1/1
{
  "common_field": "whatever",
  "foo": "whatever"
}
PUT index2/1
{
  "common_field": "whatever",
  "bar": "whatever"
}
PUT index3/1
{
  "common_field": "whatever",
  "baz": "whatever"
}

Sure it will require re-writing things.

Sure it will require re-writing things.

That's the rub: a lot of work for no appreciable gain. I can see what needs doing, it's just so unnecessary to force this on us.

Your middle example would need something to say which "type" it is as well, so I can locate objects just of that type or subset of types, which is essentially what I said originally. Which requires all searches for these to become boolean searches (on "type" and what it was the search was originally for), and loses the strict mappings other than for completely irrelevant fields (like typos on field names).

@frankshad As you say, you can continue with the same model you have today, with two changes:

  1. Add your own type field
  2. Filter on that type field instead of specifying types in the URL

For the second part, you'd convert something like:

GET foo/type_1,type_2/_search
{
   "query": {
      "match": {
         "foo":  "bar"
      }
   }
}

to:

GET foo/_search
{
   "query": {
      "bool": {
       "must": {
         "match": {
            "foo":  "bar"
         }
      },
      "filter": {
        "terms": {
          "type": ["type_1","type_2"]
        }
     }
   }
}

Or you could even use pre-defined aliases, eg:

PUT foo/_alias/foo-type_1
{
  "filter": {
    "term": {
      "type": "type_1"
    }
  }
}

PUT foo/_alias/foo-type_2
{
  "filter": {
    "term": {
      "type": "type_2"
    }
  }
}

GET foo-type_1,foo-type_2/_search
{
   "query": {
      "match": {
         "foo":  "bar"
      }
   }
}

I agree that the strict-mappings-per-type is a loss of functionality, but you can apply strict mappings to the type1, type2 etc objects suggested by @dadoonet.

That's the rub: a lot of work for no appreciable gain. I can see what needs doing, it's just so unnecessary to force this on us.

I understand your annoyance. We don't take these changes lightly. But we also have to think about improving the situation for future users of Elasticsearch. We can't just leave things in their broken state forever otherwise Elasticsearch would become one big unmaintainable ball of mud.

3 Likes

Additionally - once the notion of types becomes something the application controls then you could also introduce things like multi-valued type fields on your docs that would allow you to express a form of multiple inheritance you can query on.

Really? That means that for master data I now have to keep track of the
index the type resides in also? What about the overhead of introducing an
index per type? Definitely not negligible!

That also means that for each event type (String, number, DateTime,
GeoPoint,...) I now not only have indexes per month but also per type, what
a nightmare...

Exactly. But I have hundreds of such instances, each of which would have to be changed and thoroughly tested. If I was starting now, this wouldn't be an issue, I'd just do what's needed. But changing existing code reliably is just a complete nightmare.

Strict on the subordinate parts means (a) changing absolutely every use of the objects everywhere in my code (as x.a becomes x.type1.a, and being sure I've got every single one of them - it just isn't practical, and (b) doesn't solve the problem, as the most likely fault is that one of the fields that should be in only one of the "types" gets put into the others.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.