Hi.
Im currently intergrating elasticsearch in our java product, a CMS / Portal
solution. Until now, we have used a relational database and hibernate to
serve the index.
The main entity in our model is "content", and the datamodel is split into
3 index types; Metadata, customdata and extracted binary data. This is done
to ensure performance when updating metadata for large number of content,
e.g to be able to just update metadata.
The index is used for both text search and datasources, that is - queries
that fetches content to the web-portal. A web page in the portal typically
consists of several datasources, and will fire a couple of queries like:
{
"from" : 0,
"size" : 1,
"query" : {
"term" : {
"contenttype" : "banner"
}
},
"filter" : {
"bool" : {
"must" : {
"terms" : {
"contentlocations.menuitemkey_numeric" : [ "1954" ]
}
}
}
},
"sort" : [ {
"_score" : {
}
} ]
}
{
"from" : 0,
"size" : 3,
"query" : {
"term" : {
"contenttype" : "vignette"
}
},
"filter" : {
"bool" : {
"must" : {
"terms" : {
"contentlocations.menuitemkey_numeric" : [ "1954" ]
}
}
}
},
"sort" : [ {
"_score" : {
}
} ]
}
{
"from" : 0,
"size" : 10,
"query" : {
"term" : {
"contenttype" : "casestudy"
}
},
"filter" : {
"bool" : {
"must" : {
"terms" : {
"contentlocations.menuitemkey_numeric" : [ "1954" ]
}
}
}
},
"sort" : [ {
"_score" : {
}
} ]
}
{
"from" : 0,
"size" : 3,
"query" : {
"bool" : {
"must" : [ {
"range" : {
"data_end-date" : {
"from" : "2011-12-13t00:00:00.000+01:00",
"to" : null,
"include_lower" : true,
"include_upper" : true
}
}
}, {
"term" : {
"contenttype" : "event"
}
} ]
}
},
"filter" : {
"bool" : {
"must" : {
"terms" : {
"contentlocations.menuitemkey_numeric" : [ "1939" ]
}
}
}
},
"sort" : [ {
"orderby_data_start-date" : {
"order" : "asc"
}
}, {
"orderby_title" : {
"order" : "asc"
}
} ]
}
{
"from" : 0,
"size" : 10,
"query" : {
"match_all" : {
}
},
"filter" : {
"bool" : {
"must" : {
"terms" : {
"key_numeric" : [ "103636", "103623", "103630", "103975",
"103974", "105431", "105430", "105429", "105428", "105427" ]
}
}
}
},
"sort" : [ {
"_score" : {
}
} ]
}
What I can see so far, is that typical text queries gets a bit performance
boost by using elasticsearch compared to the old db/hibernate approach,
while portal queries like the above soon will seem slow compared to the
cached hibernate-queries.
I've not done any tuning whatsoever on the elasticsearch setup, just
integrated the engine, creates a local client with no specific settings
other than default. The index is also created with default values, and the
queries are created without any specific performance tuning.
For a typically production environment, the number of stored contents will
be from 50.000 to maybe a couple of hundred thousands, and a busy website
may have maybe a million page request within the 8 busy-hours of a day.
What i would like to know, is where to start to ensure the best possible
performance gains; caching? data-structure? queries? node configuration?
What are the typical main areas to watch for bottlenecks and easy gain?
best regards
Runar Myklebust