Regarding upgrading Elastic Search server from 0.18.3

Hi,

We are using a 2 node Elastic search server cluster on JDK 6 (1.6.0_26)
with 12 GB heap size. (4 indexes * 5 shards / index and over 60GB of
data / node for indexes)

We are observing performance problems where some of our elastic search
queries. These queries take time in order of minutes to execute per shard
in elastic search.

Writes into elastic server are pretty low (probably 10 writes / second at
max).

Reads load is higher than write load.

We were profiling elastic search and we observed that there are too many
objects of char[], String, Term and TermInfo in heap during high load. We
were wondering if we can upgrade version of elastic search that has better
memory consumption strategy and would not cause any problem with our
existing set of data and cluster.

Current version of elastic search we are on is : 0.18.3 and the
questions that we have are:

  1. Should we upgrade to a major version change? Will it cause problems?
  2. I see that 0.18.5 version has updates to Lucene index which improves
    memory consumption, should we use that?
  3. How do we upgrade a running production cluster without having a
    downtime (rolling upgrade)?
  4. Should we update JDK version too and tweak with heap settings also?

Let us know at earliest.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Am 13.02.13 23:31, schrieb girish khadke:

the questions that we have are:

  1. Should we upgrade to a major version change? Will it cause problems?
    It is always recommended to use the most recent version of Elasticsearch
    due to bug fixes or performance improvements.
  1. I see that 0.18.5 version has updates to Lucene index which
    improves memory consumption, should we use that?

No. 0.18.5 contains an outdated Lucene 3.5, there is no reason to use
such an old Lucene version.

  1. How do we upgrade a running production cluster without having a
    downtime (rolling upgrade)?
    In your case, I doubt you can. You should upgrade Java, Lucene, and ES,
    all three to next major versions. Rolling upgrades are for situations
    where you change minor JVM versions, updating to a minor ES version, or
    change cluster/index configs within the same version.
  2. Should we update JDK version too and tweak with heap settings also?
    Yes. I recommend the latest Java 7. Java 6 is no longer supported by
    Oracle: Oracle Java SE Support Roadmap

Note, heap settings are not the only settings you should take care of.
There are a lot of filter/cache tunables. Without knowing your queries,
it is hard to tell more.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Our question is whether should we upgrade cluster directly from 0.18.3 to
latest stable version 0.20.4 along with all other parameters like JDK.

We r trying to figure out cause of frequent brownouts in our production
Elasticsearch server environment. During analysis we found out that, we
do have problem of GC Hell happening under load and JVM fails to free
memory using CMS collector and causing bigger GC pauses under load.

We tried out different heap size from our initial heap size of 2gb , 3gb
and now 8gb of heap size given to Elasticsearch. But we suspect that we
could have brownouts like this in future.

We have never done any optimizations at JVM level on our Elastic search
server clusters.

Are there any good links on this ?? Any good advices on this?

On Wednesday, February 13, 2013 3:09:42 PM UTC-8, Jörg Prante wrote:

Am 13.02.13 23:31, schrieb girish khadke:

the questions that we have are:

  1. Should we upgrade to a major version change? Will it cause
    problems?
    It is always recommended to use the most recent version of Elasticsearch
    due to bug fixes or performance improvements.
  1. I see that 0.18.5 version has updates to Lucene index which
    improves memory consumption, should we use that?

No. 0.18.5 contains an outdated Lucene 3.5, there is no reason to use
such an old Lucene version.

  1. How do we upgrade a running production cluster without having a
    downtime (rolling upgrade)?
    In your case, I doubt you can. You should upgrade Java, Lucene, and ES,
    all three to next major versions. Rolling upgrades are for situations
    where you change minor JVM versions, updating to a minor ES version, or
    change cluster/index configs within the same version.
  2. Should we update JDK version too and tweak with heap settings also?
    Yes. I recommend the latest Java 7. Java 6 is no longer supported by
    Oracle: Oracle Java SE Support Roadmap

Note, heap settings are not the only settings you should take care of.
There are a lot of filter/cache tunables. Without knowing your queries,
it is hard to tell more.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you select smaller heap sizes, you can watch the heap development
quicker because GC happen earlier and more often.

Be aware that Java 6 was not developed with heaps larger than ~8GB in
mind, so there is a subtle barrier. I understand you have a 12GB heap
running. For Java 6, this is a challenge. Java 7 is designed for dealing
with larger heap sizes more easily.

Note, some older Java 6 JVMs have regressions, which are fixed in later
versions. If Java 7 is not an option, switch to the latest Java 6 JVM.

But, just by changing the JVM, you can't solve all the cases when the
capacity of a cluster is exhausted. There is a certain limit in each
cluster, and if your cluster resources are exhausted, you have to grow
your cluster.

Beside JVM tweaking - before changing parameters at places you are not
sure about, ensure yourself about what is the reason for the situation.
You can analyze the memory consumption also by using diagnostic messages
in your client to track down the issue: is it a facet/filter/cache
allocation problem? Or is it a challenge caused by badly written
queries? Or by mere query load? Without these facts, you can't expect a
true answer. Maybe you can tune queries in your app, or maybe you can
configure caching right. Maybe you can mend the situation by just adding
more nodes, which is very easy in Elasticsearch.

Jörg

Am 14.02.13 01:49, schrieb girish khadke:

Our question is whether should we upgrade cluster directly from 0.18.3
to latest stable version 0.20.4 along with all other parameters like JDK.

We r trying to figure out cause of frequent brownouts in our
production Elasticsearch server environment. During analysis we
found out that, we do have problem of GC Hell happening under load and
JVM fails to free memory using CMS collector and causing bigger GC
pauses under load.

We tried out different heap size from our initial heap size of 2gb ,
3gb and now 8gb of heap size given to Elasticsearch. But we suspect
that we could have brownouts like this in future.

We have never done any optimizations at JVM level on our Elastic
search server clusters.

Are there any good links on this ?? Any good advices on this

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. How do we upgrade a running production cluster without having a
    downtime (rolling upgrade)?

I wrote up an explanation of how we did a major upgrade without downtime
here:

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We are doing following query which is a range query and we apply sort by
timestamp criteria:

EndUserTxnReportSearchCriteria criteria = new
EndUserTxnReportSearchCriteria();
criteria.setJurhash(getReqAccount().getJurHash());
for(String operation: operationList)
criteria.addOperation(operation);
criteria.setSortDir("DESC");
criteria.setSortBy("Date");
Date endDate = Calendar.getInstance(getUserTimeZone()).getTime();
criteria.setEndDate(endDate);
Calendar startDateCal = Calendar.getInstance();
startDateCal.add(Calendar.YEAR, -1);
Date startDate = startDateCal.getTime();
criteria.setStartDate(startDate);

QueryBuilder jh  = 

QueryBuilders.termQuery(ElasticSearchTransactionTypeUtil.Fields.account.toString(),
criteria.getJurhash());
BoolQueryBuilder boolQb = QueryBuilders.boolQuery().must(jh);

// Add operation if present
if (criteria.getOperations()!=null && 

!criteria.getOperations().isEmpty()){
for(String operation: criteria.getOperations())

boolQb.should(QueryBuilders.termQuery(ElasticSearchTransactionTypeUtil.Fields.operation.toString(),operation));
boolQb.minimumNumberShouldMatch(1);
}

    // Add userID to query
    if (criteria.getExtUserId()!=null && 

!criteria.getExtUserId().equals("")){
//TODO: For demo, comment out wildcard search
//QueryBuilder uid =
QueryBuilders.wildcardQuery(ElasticSearchTransactionTypeUtil.Fields.user.toString(),
"" + criteria.getExtUserId().toLowerCase() + "");
QueryBuilder uid =
QueryBuilders.wildcardQuery(ElasticSearchTransactionTypeUtil.Fields.user.toString(),
"" + criteria.getExtUserId().toLowerCase() + "");
boolQb = boolQb.must(uid);
}

    // Build query
    QueryBuilder qb = QueryBuilders.filteredQuery(boolQb,

FilterBuilders.rangeFilter(ElasticSearchTransactionTypeUtil.Fields.timeStamp.toString())
.from(criteria.getStartDate())
.to(criteria.getEndDate())
.includeLower(true)
.includeUpper(false));

    // Get client
    TransportClient client = getClient();
    if(client != null){
    try{
     
        
        if(log.isDebugEnabled())
            log.debug("Query:" + new 

String(qb.buildAsBytes(XContentType.JSON)));

        // Get response
        SearchRequestBuilder requestBuilder= client.prepareSearch()

.setOperationThreading(SearchOperationThreading.THREAD_PER_SHARD)

.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(qb.buildAsBytes())
.setFrom(firstResult - 1)
.setSize(pageSize)
.setIndices(esIndices)
*.addSort(getSortBy(criteria.getSortBy()),
getSortDir(criteria.getSortDir())) *
.setExplain(false);

        SearchResponse response =  

requestBuilder.execute().actionGet(getTimeoutinmillis());

        if(log.isDebugEnabled())
            log.debug("SearchResponse:" + response.toString());
        List<EventLog> eventLogList = new ArrayList<EventLog>();
        
        try {            
            SearchHit[] hits =  (response==null || response.getHits()== 

null)?null:response.getHits().getHits();
if(hits!=null && hits.length > 0) {
EventLog eventLog;
for(int i=0;i<hits.length;i++){
Map<String,Object> map =
hits[i].sourceAsMap();
eventLog =
ElasticSearchTransactionTypeUtil.convertFromESDataToEventLog(map);
if(log.isDebugEnabled())
log.debug("Adding event log " + eventLog);
eventLogList.add(eventLog);
}
return eventLogList;
}
} catch (Exception e) {
log.error("Error while parsing the results from elastic
search.");
throw new RuntimeException(e);
}

        return eventLogList;
    }
    finally{
        //The client should never close because of the client is 

singleton.
//if(client!=null) client.close();
}
}
return null;
}

Timestamp is a very high cardinality field (Almost unique and probably we
have a lots and lots of such unique terms in our data). I think sorting by
timestamp is something that is causing problems with these queries..
When we search for last 2 years worth of data and do a search, the search
just fails to give us back results within 20s timeout. We face this issue
intermittantly and we are trying to debug why is this happening.

Currently we use Elastic search 0.18.3 .

Is there a better way of writing above query to get data for reporting?
Is there some functionality like limite() in Elastic Search Search API?

Looks like we also need to move to better version of lucene to improve on
memory usage (lucene 3.6+)

Thanks and regards,
Girish Khadke

On Thursday, February 14, 2013 3:20:37 AM UTC-8, Clinton Gormley wrote:

  1. How do we upgrade a running production cluster without having a
    downtime (rolling upgrade)?

I wrote up an explanation of how we did a major upgrade without downtime
here:

Upgrading a running elasticsearch cluster · GitHub

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, I am quite familiar with that kind of requirement. Sorting values
on an inverse index is heavy. It is generating tabular data but must use
inversely indexed documents. With low cardinality and reasonable heap
size, it is often unoticed there is a challenge. A too high cardinality
of the field swamps the heap. And it is even more challenging in
situations when you just need top ranked documents, because the largest
part of sorting computation is wasted, it will not be used for
delivering results. I see you are fetching documents pagewise.

There are some options:

  • reducing timestamp cardinality by creating buckets: maybe it is
    possible to sort by month, week, day, hour, minute (and not by such fine
    resolution like seconds or milliseconds)

  • avoid sorting at all: boost the documents at indexing time, according
    to their age, and use relevance scoring

  • use time-based rolling indices to distribute the timestamps across
    many indices

  • precompute document order, put your documents in an index with static
    pagecounters, so you can retrieve them page by page (if you have static
    paging function)

  • brute force: bring up more hardware (RAM) und increase the heap, and
    continue to sort (even this strategy will cause delays when you exceed a
    certain limit, it's around some dozens of GB, because loading values
    into the heap for sort will take noticable time even when ES is
    mlockall()'d )

You can't get around the issue just by updating to the latest
Elasticsearch or the latest JVM.

Jörg

Am 15.02.13 02:43, schrieb girish khadke:

Timestamp is a very high cardinality field (Almost unique and
probably we have a lots and lots of such unique terms in our data). I
think sorting by timestamp is something that is causing problems with
these queries.. When we search for last 2 years worth of data and
do a search, the search just fails to give us back results within 20s
timeout. We face this issue intermittantly and we are trying to
debug why is this happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We are also wondering that, if larger GC pauses also could be the issue
? Could just GC tuning would solve the problem?

Regards,
Girish

On Friday, February 15, 2013 12:22:38 AM UTC-8, Jörg Prante wrote:

Yes, I am quite familiar with that kind of requirement. Sorting values
on an inverse index is heavy. It is generating tabular data but must use
inversely indexed documents. With low cardinality and reasonable heap
size, it is often unoticed there is a challenge. A too high cardinality
of the field swamps the heap. And it is even more challenging in
situations when you just need top ranked documents, because the largest
part of sorting computation is wasted, it will not be used for
delivering results. I see you are fetching documents pagewise.

There are some options:

  • reducing timestamp cardinality by creating buckets: maybe it is
    possible to sort by month, week, day, hour, minute (and not by such fine
    resolution like seconds or milliseconds)

  • avoid sorting at all: boost the documents at indexing time, according
    to their age, and use relevance scoring

  • use time-based rolling indices to distribute the timestamps across
    many indices

  • precompute document order, put your documents in an index with static
    pagecounters, so you can retrieve them page by page (if you have static
    paging function)

  • brute force: bring up more hardware (RAM) und increase the heap, and
    continue to sort (even this strategy will cause delays when you exceed a
    certain limit, it's around some dozens of GB, because loading values
    into the heap for sort will take noticable time even when ES is
    mlockall()'d )

You can't get around the issue just by updating to the latest
Elasticsearch or the latest JVM.

Jörg

Am 15.02.13 02:43, schrieb girish khadke:

Timestamp is a very high cardinality field (Almost unique and
probably we have a lots and lots of such unique terms in our data). I
think sorting by timestamp is something that is causing problems with
these queries.. When we search for last 2 years worth of data and
do a search, the search just fails to give us back results within 20s
timeout. We face this issue intermittantly and we are trying to
debug why is this happening.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.