Implementing a "set" type in Elasticsearch

As a test, I have a very basic mapping that consists of an id field and a "colors" text field. I want the colors field to only hold one of each type of color. If a color already exists in the colors field, I don't want to add multiple values to that field.

To experiment, I've made two Python methods to add and remove multiple colors:

def remove_multiple_items(id):
    doc = defaultdict(dict)
    doc['script'] = defaultdict(dict)
    doc['script']['lang'] = "painless"
    doc['script']['source'] = "params.colors.forEach(c -> {ctx._source.colors.removeIf(e -> e.equals(c))})"
    doc['script']['params'] = {"colors": ["orange","green","black","white"]}
    update_doc(id=id, data=doc)

def add_multiple_items(id):
    doc = defaultdict(dict)
    doc['script'] = defaultdict(dict)
    doc['script']['lang'] = "painless"
    doc['script']['source'] = "params.colors.forEach(c -> {ctx._source.colors.add(c)})"
    doc['script']['params'] = {"colors": ["orange","green","black","white"]}
    update_doc(id=id, data=doc)

The issue with the "add_multiple_items" method is that it will add multiple colors even if the color already exists. Here are the two main questions:

  1. Is it possible in Elasticsearch to specify a fieldtype as a "set" where the field will only contain one of each element?

  2. If a set type is not implemented, what is the best method to treat a field as a set? Would I need to put logic in the painless script to check the existing values, put those values into a list and then as I add each new color, check the list to see if that color already exists?

  3. Is there a "debug" command for ctx that will print out whatever I give it to print? For example, params.colors.forEach( c -> {ctx.print(c)}) ... Ideally there would be a page somewhere that listed all of the ctx commands that are available. I'm basically looking for some type of println, logging, etc. feature if it exists.

  4. I'd like to see an example of a painless script that will take all elements in an array and reduce them to a set. For example, if ctx._source.colors has ["yellow","orange","yellow","black"] it would be nice to see an example of a script that will reduce ctx._source.colors to ["yellow","orange","black"]

  1. Nope. Interesting idea though, might be worth raising a feature request for?
  2. That or do it in your code, external to Elasticsearch.

(I can't help on the last two sorry :frowning: )

Thanks for your reply Mark! I really appreciate it. You are always super helpful and I agree with you that implementing a set type might be a great addition to Elasticsearch.

I'll keep experimenting with the code to come up with a way to implement a set type via the client / app side.

PS: Another idea that I had was to implement a "pre-insert" script attribute for specific fields when creating a mapping. This would allow the user to create all types of different custom fields based on code. For example, when creating a new mapping, the "colors" field might have an attribute to run a script prior to inserting data. Something like:

{
  "properties": {
    "id" : {
      "type": "integer"
    },
    "colors": {
      "type": "text",
      "insert_script": "foo"
    }
  }
}

Then the user could create a painless script and upload it to the server and call it "foo". Each time an insert takes place, it would execute the script before inserting the value. Kind of like mapping templates but these could be script templates. Maybe something like this already exists? In this example, if a new document is inserted, the script could handle validation, etc. There could also be an "update_script" setting that fires off a script during update operations, etc. The update script could look at the existing field values and only allow new distinct values, etc.

Thanks again!

1 Like

Hey Mark -- where is the best place to make feature requests? The Github repo?

Yep :slight_smile: