General question on query optimization

(Ian Campbell) #1

I have five general optimization strategy questions...

I have a case where I need to search across multiple fields (about 16 to 32). The fields look as follows:

  1. Numeric arrays. All corresponding search terms must match. Should I store values as keywords or integers?
  2. Numeric ranges with preceding attribute IDs. Here, it makes sense to me to make the field name the attribute ID, e.g. "a1:100" where a1 = attribute #1. Corresponding search values must overlap. Is this the best way to do this (i.e. the attribute-value-range setup)?
  3. Text... nothing special here.

My fourth question is: I could produce a hash of the attribute IDs and store the hash. This would give an extremely high probability that a document with a matching attribute hash contains all the required attributes, leaving this assertion and the values to be verified.

The hash would obviously have excellent cardinality. My question: will this help ES to more rapidly locate matching documents? In other words, are fields with rare values helpful in queries where other field values may be very common?

Remember, in point 2 I'm using the attribute ID as a field name, so the hash might not help. I'd still like to know if value rarity is useful.

Finally, several of the fields will have identical values across potentially millions of documents. Is it better to store these common fields in a separate index and do a manual "join", ie., lookup the "parent" ID where the common fields match and use this ID in the "child"?

Thanks much!