Best practice for making a new index in ElasticSearch?

stevebissett · March 2, 2016, 12:34pm

I am new to ElasticSearch. I am using it to search variables within market research surveys.

survey_id: 100
name: gender
label: Gender
value_labels: {0 => Male, 1 => Female}

I have about 400 surveys with around 200000 variables.

I never need to search for variables for more than one survey at a time.

Should I be making a new index for each survey, or should I have a field called survey_id which I filter on each time I search.?

What is the best practice here?

My current search is as follows:

GET /search_variables/variables/_search
{
    "query": {
      "bool": {
        "must": {
          "match": {
            "search_text": {
              "query": "BMW",
              "operator": "and"
            }
          }
        },
        
        "must_not": {
          "match": {
            "search_text": {
              "query": "",
              "operator": "or"
            }
          }
        }
      }
    }
}

I have been using filter and term with the above.

        "filter": {
             "term": {
             "survey_id": 12
             }
         }

ddorian43 · March 2, 2016, 1:35pm

Filtering on survey_id is the right approach. You can also _route on survey_id(need to do so also when you insert) so the query gets executed on 1 shard.

warkolm · March 3, 2016, 6:45am

I wouldn't bother routing like that to be honest, it just adds complexity where it isn't really needed.

This will be your biggest problem, you may run into a mapping explosion with all those fields. The only good way around that is to denormalise, or maybe try putting similar surveys into the same index (and thus have more than one index).