Elasticsearch slow query response time using terms facet

We are experiencing unusual slow query response time using terms facets.
With faceted search, the query takes around 15 to 20 secs to return the response.
Without faceted search the query takes around 2 secs (still on the higher order of latency)
Earlier With 11m documents and 3 data nodes and 2 non data nodes, the latency would be in the order of 100 to 200ms. Not sure whats going on here.

Couple of observations:
a) The shards are not equally/fairly distributed among the 10 data nodes
b) The master could be a data node or a non data node
c) I thought adding more nodes would result in faster search response time (but i'm seeing decrease in performance of query execution) ?

Specs:
Elasticsearch v0.90.2
25 Million text documents indexed
10 Data nodes (running on Centos 6)
2 Non data nodes (running on Centos 6)
running on JVM 7

Query issued:
GET _search
{
"facets": {
"histo": {
"date_histogram": {
"field": "closeDt",
"interval": "1.5h"
}
},
"channel": {
"terms": {
"field": "channel"
}
},
"siteId": {
"terms": {
"field": "siteId"
}
}
},
"query": {
"filtered": {
"filter": [
{
"range": {
"closeDt": {
"from": 1325404800000,
"to": 1385107200000
}
}
}
],
"query": {
"query_string": {
"query": "ebay now",
"default_operator": "AND",
"fields": [
[
"emails.emailBody",
"srId",
"chatTextArray.text"
]
]
}
}
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
"
"
],
"fields": {
"srId": {
"number_of_fragments": 0
},
"emails.emailBody": {
"number_of_fragments": 0
},
"chatTextArray.text": {
"number_of_fragments": 0
}
}
},
"timeout": 30000
}

Query response:
{
"took": 24426,
"timed_out": false,
"_shards": {
"total": 16,
"successful": 10,
"failed": 6,
"failures": [
{
"index": "_river",
"shard": 0,
"status": 400,

Without facets - query response time is 2 secs (still the latency is high)
{
"took": 2625,
"timed_out": false,
"_shards": {
"total": 16,
"successful": 16,
"failed": 0
},
"hits": {
"total": 3942856,
"max_score": 0.30295128,
"hits": [

Cluster health:
{
cluster_name: elasticsearch_dc1
status: yellow
timed_out: false
number_of_nodes: 12
number_of_data_nodes: 10
active_primary_shards: 16
active_shards: 31
relocating_shards: 0
initializing_shards: 1
unassigned_shards: 0
}

We tried using _optimize to merge segments to 1 per shard, it helped quite a bit.

Still our query response time is > 1 sec.

Total number of documents indexed: 25,127,713
Query: ebay

of hits: 18,527,606 results

Time taken: 5.54 seconds (wow !)

of indexes searched : 2

there is one shard per index and each shard is of size 20gb

Query issued:
GET _search
{
"query": {
"filtered": {

     "query": {
        "query_string": {
           "query": "ebay",
           "default_operator": "OR",
           "fields": [
              [
                 "emails.emailBody",                    
                 "chatTextArray.text"
              ]
           ]
        }
     }
  }

}
}

Response:
{
"took": 2989,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 18527794,
"max_score": 0.094853,
"hits": [
{
"_index": "email2",
"_type": "json",
"_id": "1288731070",
"_score": 0.094853,
"_source": { ..}
} ]
}

Default query:
GET _search?
{
"query": {
"filtered": {
"filter": [
{
"range": {
"closeDt": {
"from": 1325404800000,
"to": 1386057600000
}
}
}
]
}
}
}

Response:
{"took":1815,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22035134,"max_score":1.0,"hits":[{"_index":"email2","_type":"json","_id":"1273442109","_score":1.0, "_source" : {..}