ES Performance/Design Practice


(Karthik Ramachandran) #1

I'm trying to answer the below queries so that I understand ES better in line with performance.

Documents that I index into ES may follow below structure with multiple attributes
{
AttX: <<Single Value attribute which take data in any type viz. String, Number, Datetime>>
AttY:[<>]
AttZ:[{<<Array of nested content which may also contain array of strings as value} }]
AttM: <<Single value of string which is a like a paragraph, where AttX is a single Value>>
}
AttZ ex.
{
{ AttA: myvalue1, AttB: [strVal, strVal]}
{ AttA: myvalue2, AttB: [strVal, strVal]}
,
}

X/Y/Z/M specified for example, but multiple attributes of same nature will be present in the json.

Given the structure of document explained and ES does Lucene by default, I have below queries

Indexing

  1. Does Lucene applied to all attributes? Because reverse indexing on all attributes could be a costly. I use AttX/Y only for filtering, but not searching. AttM is used for searching in combination with X/Y. Do we have any options to say these attributes are only for filtering?

Indexing and Delta

  1. Assuming data is coming hourly, and it also produces lot of changes viz. updates to existing items, and may impact indexes. Thought of defining types within index and drop types using DELETE http://localhost:9200/index/type could be done so that no update happens, rather re-indexing happens. But looks like it won't drop the actual indexed content in the type. Do we have any choices here to manage delta updates effectively? Sorry, but reading item 3 should provide more clarity on this question.

  2. Aliasing to manage changes->Aliasing is at Index Level. So how many Indexes an alias can have? Instead of types, multiple Indexes with 1 shard could be used. But, having multiple indexes for same content type doesn't look nice assuming it requires 300+ indexes to be created representing 1 index a day.

NOTE: Coming from SQL background, I'm mapping my needs to UPSERTS vs TRUNCATE/LOAD vs QUERY

Thanks for your time and clarification.


(Karthik Ramachandran) #2

Part 1 - Got the answer for 1: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html
mapping type tag Index Not Analyzed helps.


(system) #3