Need suggestions on creation of an index for specific use-case


(apanimesh061) #1

I have been using Elasticsearch for quite a while. In nm use case I have document sets denoted by an ID. These sets have documents that are denoted by IDs as well. There is a default mapping that all sets follow. Only difference is that I can add custom analysers to the mappings such that only Document Set has that, all others will remain as it is. I thought of having a parent-child settings between sets and documents. I came up with the following but I don't think I will be able to solve my issue as every set will need to have a new mapping even though it might be same:

PUT /test_index/
{
  "settings": {
    "index.store.type": "default",
    "index": {
        "number_of_shards": 5,
        "number_of_replicas": 1,
        "refresh_interval": "60s"
    },
    "analysis": {
        "filter": {
            "porter_stemmer_en_EN": {
                "type": "stemmer",
                "name": "porter"
            },
            "default_stop_name_en_EN": {
                "type": "stop",
                "name": "_english_"
            },
            "snowball_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "snowball.stop"
            },
            "smart_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "smart.stop"
            },
            "shingle_filter_en_EN": {
                "type": "shingle",
                "min_shingle_size": "2",
                "max_shingle_size": "2",
                "output_unigrams": true
            }
        }
    }
  }
}

Here is the default mapping that I have:

PUT /test_index/document_set/_mapping
{
  "document_set": {
         "properties": {
            "docset_id": {
               "type": "string"
            }
         }
    }
}

PUT /test_index/document/_mapping
{
  "document": {
      "dynamic": "strict",
         "_parent": {
            "type": "document_set"
         },
         "properties": {
            "doc_id": {
               "type": "string"
            },
            "text": {
               "type": "multi_field",
               "fields": {
                 "text": {
                   "type": "string",
                   "store": true,
                   "index": "analyzed"
                 },
                 "pdf": {
                   "type": "attachment",
                   "store": true,
                   "index": "analyzed"
                 }
               }
            }
         }
      }
}

Now suppose I create a custom analyzer:

"analyzer": {
                  "shingle_type": {
                     "type": "custom",
                     "filter": [
                        "porter_stemmer_en_EN",
                        "smart_stop_words_en_EN",
                        "shingle_filter_en_EN",
                        "lowercase"
                     ],
                     "tokenizer": "whitespace"
                  }

But I want to add it only to some specified document_set so that only that set's documents are analyzed by that ,how will I do it?

Any suggestion is welcomed.


(system) #2