ElasticSearch 0.19.2 heap space shortage, becoming unresponsive and not recovering or releasing memory

sujoysett · April 27, 2012, 10:15am

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
ElasticSearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
ElasticSearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above ElasticSearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunately, we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~~~Hankster~~~1~Blog",
"domain_media": "http://www.bubbleinfo.com~null~1~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": [],
"findings": []
}

Rafal_Kuc_3 · April 27, 2012, 10:22am

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 
}

sujoysett · April 27, 2012, 10:42am

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 
}

jagdeep · April 27, 2012, 10:49am

My main concern is recovery failure. Heap space error is expected if
are trying to load too many documents in memory but elasticsearch
nodes should recover after this error. I suppose, after this stage
even flush, refresh or optimize will also not work.

Regards

On Apr 27, 3:42 pm, Sujoy Sett sujoys...@gmail.com wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type":

...

read more »

Rafal_Kuc_3 · April 27, 2012, 10:50am

Hello!

Nodes statistics provide information about cache usage. For example run the
following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data
cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 
}

sujoysett · April 27, 2012, 11:14am

Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query *
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:
"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }
With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 
}

sujoysett · April 27, 2012, 11:19am

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query *
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:
"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }
With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to
set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster state and health, I didn't find anything
like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 
}

Rafal_Kuc_3 · April 27, 2012, 11:32am

Hello!

Before hitting ES with query you had empty field data cache and after that your cache was way higher - 3.5gb and 2.4gb. The default settings is that field data cache is unlimited (in terms of entries). You may want to do one of the following changes to your ElasticSearch configuration:

Set field data cache type to soft. This will cause this cache to use Java soft references and thus will enable GC to release memory used by field data cache, when more heap memory is needed. You can do that by adding the following line to the configuration:

index.cache.field.type: soft

Limit field data cache size, by setting its maximum number of entries. You have to remember that maximum number of settings is per segment, not per index. To set that, add the following line to the configuration:

index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be good for your deployment.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed

java.lang.OutOfMemoryError: loading field [phrases] caused out of memory failure

along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We really appreciate and are thankful to you for your prompt response. We have tested the same with our indexes. Following are the observations. What does it imply and please suggest if we are doing anything wrong in settings or elsewhere.

Initial State

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "503.1mb",


      "size_in_bytes" : 527622079


    },


    "docs" : {


      "count" : 74250,


      "deleted" : 2705


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 0,


      "query_time" : "0s",


      "query_time_in_millis" : 0,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 0,


      "query_time" : "0s",


      "query_time_in_millis" : 0,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

After hitting query

{

"query" : {


    "match_all" : {  }


},


"size" : 0,


"facets" : {


    "tag" : {


        "terms" : {


            "field" : "phrases",


            "size" : 100


        },


        "_cache":false


    }


}

}

After single request

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "6.3gb",


      "size_in_bytes" : 6787402724


    },


    "docs" : {


      "count" : 876639,


      "deleted" : 56407


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 2,


      "query_time" : "21.8s",


      "query_time_in_millis" : 21869,


      "query_current" : 4,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "3.5gb",


      "field_size_in_bytes" : 3834410088,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 4,


      "query_time" : "21.8s",


      "query_time_in_millis" : 21808,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "2.4gb",


      "field_size_in_bytes" : 2653970178,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

After two requests

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 11,


      "query_time" : "1.9m",


      "query_time_in_millis" : 116142,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "4.9gb",


      "field_size_in_bytes" : 5323063782,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 9,


      "query_time" : "49.6s",


      "query_time_in_millis" : 49662,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "4.2gb",


      "field_size_in_bytes" : 4587853968,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

After three requests

ES down with heap space error.

No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Nodes statistics provide information about cache usage. For example run the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data cache, something like the following:

"cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 1,


      "filter_evictions" : 0,


      "filter_size" : "32b",


      "filter_size_in_bytes" : 32


    }

With it you should be able to see how much memory your field data cache consumes.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to set anything to monitor explicitly?

I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster state and health, I didn't find anything like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the example query ?

Regards,

Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data

from social media blogs and forums. The data volume is going up to

500000 documents per index, and size of this volume of data in

Elasticsearch index is going up to 3 GB per index per node (all

shards). We always maintain the number of replicas 1 less than the

total number of nodes to ensure that a copy of all shards should

reside on every node at any instant. The number of shards are

generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization

purpose, and mainly facets for showing trend charts or keyword clouds.

Following are some example of the query we execute:

{

"query" : { 


    "match_all" : {  } 


}, 


"size" : 0, 


"facets" : { 


    "tag" : { 


        "terms" : { 


            "field" : "nouns", 


            "size" : 100 


        }, 


        "_cache":false 


    } 


}

}

{

"query" : { 


    "match_all" : {  } 


}, 


"size" : 0, 


"facets" : { 


    "tag" : { 


        "terms" : { 


            "field" : "phrases", 


            "size" : 100 


        }, 


        "_cache":false 


    } 


}

}

While executing such queries we often encounter heap space shortage,

and the nodes becomes unresponsive. Our main concern is that the nodes

do not recover to normal state even after dumping the heap to a hprof

file. The node still consumes the maximum allocated memory as shown in

task manager java.exe process, and the nodes remain unresponsive until

we manually kill and restart them.

ES Configuration 1:

ElasticSearch Version 0.19.2

2 Nodes, one on each physical server

Max heap size 6GB per node.

10 shards, 1 replica.

ES Configuration 2:

ElasticSearch Version 0.19.2

6 Nodes, three on each physical server

Max heap size 2GB per node.

10 shards, 5 replica.

Server Configuration:

Windows 7 64 bit

64 bit JVM

8 GB pysical memory

Dual Core processor

For both the configuration mentioned above ElasticSearch was unable to

respond to the facet queries mentioned above, it was also unable to

recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request

you to please suggest a better configuration or a different approach

if required.

The mapping of the data is we use is as follows:

(keyword1 is a customized keyword analyzer, similarly standard1 is a

customized standard analyzer)

{

        "properties": { 


            "adjectives": { 


                "type": "string", 


                "analyzer": "stop2" 


            }, 


            "alertStatus": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedByUserId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedByUserName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedToDepartmentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedToDepartmentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedToUserId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedToUserName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "authorJsonMetadata": { 


                "properties": { 


                    "favourites": { 


                        "type": "string" 


                    }, 


                    "followers": { 


                        "type": "string" 


                    }, 


                    "following": { 


                        "type": "string" 


                    }, 


                    "likes": { 


                        "type": "string" 


                    }, 


                    "listed": { 


                        "type": "string" 


                    }, 


                    "subscribers": { 


                        "type": "string" 


                    }, 


                    "subscription": { 


                        "type": "string" 


                    }, 


                    "uploads": { 


                        "type": "string" 


                    }, 


                    "views": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "authorKloutDetails": { 


                "dynamic": "true", 


                "properties": { 


                    "amplificationScore": { 


                        "type": "string" 


                    }, 


                    "authorKloutDetailsFound": { 


                        "type": "string" 


                    }, 


                    "description": { 


                        "type": "string" 


                    }, 


                    "influencees": { 


                        "dynamic": "true", 


                        "properties": { 


                            "kscore": { 


                                "type": "string" 


                            }, 


                            "twitter_screen_name": { 


                                "type": "string" 


                            } 


                        } 


                    }, 


                    "influencers": { 


                        "dynamic": "true", 


                        "properties": { 


                            "kscore": { 


                                "type": "string" 


                            }, 


                            "twitter_screen_name": { 


                                "type": "string" 


                            } 


                        } 


                    }, 


                    "kloutClass": { 


                        "type": "string" 


                    }, 


                    "kloutClassDescription": { 


                        "type": "string" 


                    }, 


                    "kloutScore": { 


                        "type": "string" 


                    }, 


                    "kloutScoreDescription": { 


                        "type": "string" 


                    }, 


                    "kloutTopic": { 


                        "type": "string" 


                    }, 


                    "slope": { 


                        "type": "string" 


                    }, 


                    "trueReach": { 


                        "type": "string" 


                    }, 


                    "twitterId": { 


                        "type": "string" 


                    }, 


                    "twitterScreenName": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "author_media": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "brandTerms": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "calculatedSentimentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "calculatedSentimentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "categories": { 


                "properties": { 


                    "category": { 


                        "type": "string", 


                        "analyzer": "keyword1" 


                    }, 


                    "categoryWords": { 


                        "type": "string", 


                        "analyzer": "keyword1" 


                    }, 


                    "score": { 


                        "type": "double" 


                    } 


                } 


            }, 


            "commentCount": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentAuthorId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentAuthorName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentJsonMetadata": { 


                "properties": { 


                    "comment Count": { 


                        "type": "string" 


                    }, 


                    "dislikes": { 


                        "type": "string" 


                    }, 


                    "favourites": { 


                        "type": "string" 


                    }, 


                    "likes": { 


                        "type": "string" 


                    }, 


                    "retweet Count": { 


                        "type": "string" 


                    }, 


                    "views": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "contentPublishedTime": { 


                "type": "date", 


                "index": "analyzed", 


                "format": "dateOptionalTime" 


            }, 


            "contentTextFull": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentTextFullHighlighted": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentTextSnippetHighlighted": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentType": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentUrlId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentUrlPath": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentUrlPublishedTime": { 


                "type": "date", 


                "index": "analyzed", 


                "format": "dateOptionalTime" 


            }, 


            "ctmId": { 


                "type": "long" 


            }, 


            "domainName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "domainUrl": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "domain_media": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "findings": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "geographyId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "geographyName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "kloutScore": { 


                "type": "object" 


            }, 


            "languageId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "languageName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "listListeningObjectiveName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceIconPath": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "mediaSourceName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceTypeId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "mediaSourceTypeName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "notesCount": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "nouns": { 


                "type": "string", 


                "analyzer": "stop2" 


            }, 


            "opinionWords": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "phrases": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "profileId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "profileName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "topicId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "topicName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "userSentimentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "userSentimentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "verbs": { 


                "type": "string", 


                "analyzer": "stop2" 


            } 


        } 


    }

A sample of the structure of the data is as follows:

{

"contentType": "comment", 


"topicId": 9, 


"mediaSourceId": 3, 


"contentId": 34834, 


"ctmId": 73322, 


"contentTextFull": "The low numbers nationally published by

Corelogic were a result of banks holding off foreclosures until

settlement. \nAs Bloomberg and RealtyTrac stated. this will result in

more foreclosure pain in the short term as some of the foreclosures

that should have happened last year instead happen this year which

will likely result in higher foreclosure numbers in 2012 than

2011.\nThe estimates from Realtytrac and Zillow are hovering around 1

million completed foreclosures, or REOs, in 2012, a 25 percent

increase from 2011. \nThe positive is that the data suggests that

short sales net the banks more money so they should be expected to

increase\nThe bottom line is that in the longer term the bank

settlement will help to more quickly clear the so-called shadow

inventory, which will in turn help the housing market finally bottom

out once and for all. \nMy buddy who bought in Santa Luz in 2006 is

asked every month by his bank when he makes his payment on his $1.2mm

underwater home, do you plan on staying in the house? . Per

Corelogic, there are still large numbers still underwater in SD\n-

3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is

we only have one last market to get hit, and expect the high end.

The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/

business/ci_19899224</a>nUnfortunately, we can not avoid the headwinds.",

"contentTextFullHighlighted": null, 


"contentTextSnippetHighlighted": "The low numbers nationally

published by Corelogic were a result of banks holding off foreclosures

until settlement. \nAs Bloomberg and RealtyTrac stated. this will

result in more foreclosure pain in the short term as some of the

foreclosures that should have happened last year instead happen...",

"contentJsonMetadata": null, 


"commentCount": 117, 


"contentUrlId": 13535, 


"contentUrlPath": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/">http://www.bubbleinfo.com/</a><a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/">2012/02/09/mortgage- </a>

settlement-renegade/",

"domainUrl": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com">http://www.bubbleinfo.com</a>", 


"domainName": null, 


"contentAuthorId": 15614, 


"contentAuthorName": "Hankster", 


"authorJsonMetadata": null, 


"authorKloutDetails": null, 


"mediaSourceName": "Board Reader Blog", 


"mediaSourceIconPath": "BoardReaderBlog.gif", 


"mediaSourceTypeId": 1, 


"mediaSourceTypeName": "Blog", 


"geographyId": 0, 


"geographyName": "Unknown", 


"languageId": 1, 


"languageName": "English", 


"topicName": "Bank of America", 


"profileId": 3, 


"profileName": "USAA_Competition1", 


"contentPublishedTime": 1328798840000, 


"contentUrlPublishedTime": 1329336423000, 


"calculatedSentimentId": 4, 


"calculatedSentimentName": "POS", 


"userSentimentId": 0, 


"userSentimentName": null, 


"listListeningObjectiveName": [ 


    "Untagged LO" 


], 


"alertStatus": "assigned", 


"assignedToUserId": 2, 


"assignedToUserName": null, 


"assignedByUserId": 1, 


"assignedByUserName": null, 


"assignedToDepartmentId": 0, 


"assignedToDepartmentName": null, 


"notesCount": 0, 


"nouns": [ 


    "bank", 


    "banks", 


    "Bloomberg", 


    "buddy", 


    "Corelogic", 


    "data", 


    "estimates", 


    "foreclosure", 


    "foreclosures", 


    "headwinds", 


    "home", 


    "house", 


    "housing", 


    "increase", 


    "inventory", 


    "line", 


    "Luz", 


    "market", 


    "mm", 


    "money", 


    "month", 


    "net", 


    "news", 


    "numbers", 


    "pain", 


    "payment", 


    "percent", 


    "Realtytrac", 


    "RealtyTrac", 


    "REOs", 


    "result", 


    "sales", 


    "Santa", 


    "SD", 


    "settlement", 


    "shadow", 


    "term", 


    "turn", 


    "year", 


    "Zillow" 


], 


"verbs": [ 


    "asked", 


    "avoid", 


    "bought", 


    "completed", 


    "expect", 


    "expected", 


    "get", 


    "happen", 


    "happened", 


    "help", 


    "hit", 


    "holding", 


    "hovering", 


    "increase", 


    "makes", 


    "plan", 


    "published", 


    "result", 


    "stated", 


    "staying", 


    "suggests" 


], 


"adjectives": [ 


    "bottom", 


    "clear", 


    "finally", 


    "good", 


    "high", 


    "higher", 


    "instead", 


    "large", 


    "last", 


    "likely", 


    "longer", 


    "low", 


    "nationally", 


    "next", 


    "not", 


    "positive", 


    "quickly", 


    "short", 


    "so-called", 


    "underwater", 


    "Unfortunately" 


], 


"phrases": [ 


    "2012 than 2011", 


    "25 percent", 


    "25 percent increase", 


    "2700 underwater in 92130", 


    "3800 underwater in 92127", 


    "92130 The good news", 


    "asked every month", 


    "avoid the headwinds", 


    "bank settlement", 


    "banks holding off foreclosures", 


    "banks more money", 


    "Bloomberg and RealtyTrac", 


    "bottom line", 


    "bought in Santa", 


    "bought in Santa Luz", 


    "clear the so-called shadow", 


    "completed foreclosures", 


    "estimates from Realtytrac", 


    "foreclosure numbers", 


    "foreclosure numbers in 2012", 


    "foreclosure pain", 


    "foreclosures until settlement", 


    "good news", 


    "happen this year", 


    "happen this year --", 


    "happened last year", 


    "help the housing", 


    "help the housing market", 


    "higher foreclosure", 


    "higher foreclosure numbers", 


    "holding off foreclosures", 


    "housing market", 


    "increase from 2011", 


    "increase The bottom line", 


    "instead happen this year", 


    "large numbers", 


    "last market", 


    "last year", 


    "longer term", 


    "longer term the bank", 


    "low numbers", 


    "Luz in 2006", 


    "makes his payment", 


    "million completed foreclosures", 


    "mm underwater home", 


    "month by his bank", 


    "nationally published by Corelogic", 


    "net the banks", 


    "not avoid the headwinds", 


    "numbers in 2012", 


    "percent increase", 


    "percent increase from 2011", 


    "published by Corelogic", 


    "Realtytrac and Zillow", 


    "result in higher foreclosure", 


    "result in more foreclosure", 


    "result of banks", 


    "sales net", 


    "sales net the banks", 


    "Santa Luz", 


    "Santa Luz in 2006", 


    "shadow inventory", 


    "short sales", 


    "short sales net", 


    "short term", 


    "so-called shadow", 


    "so-called shadow inventory", 


    "staying in the house", 


    "suggests that short sales", 


    "term the bank", 


    "term the bank settlement", 


    "turn help the housing", 


    "underwater home", 


    "underwater in 92127", 


    "underwater in 92130", 


    "underwater in SD", 


    "year --" 


], 


"author_media": "15614~~~Hankster~~~1~~~Blog", 


"domain_media": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com">http://www.bubbleinfo.com</a>~~~null~~~1~~~Blog", 


"categories": [ 


    { 


        "category": "post closing", 


        "categoryWords": [ 


            "foreclosure", 


            "foreclosure" 


        ], 


        "score": "2.0" 


    }, 


    { 


        "category": "pre buy research", 


        "categoryWords": [ 


            "term", 


            "term" 


        ], 


        "score": "2.0" 


    } 


], 


"opinionWords": [ 


    "positive", 


    "good news", 


    "expect", 


    "unfortunately" 


], 


"brandTerms": [], 


"findings": []

}

sujoysett · April 27, 2012, 12:00pm

Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Before hitting ES with query you had empty field data cache and after that
your cache was way higher - 3.5gb and 2.4gb. The default settings is that
field data cache is unlimited (in terms of entries). You may want to do one
of the following changes to your Elasticsearch configuration:

Set field data cache type to soft. This will cause this cache to use
Java soft references and thus will enable GC to release memory used by
field data cache, when more heap memory is needed. You can do that by
adding the following line to the configuration:
index.cache.field.type: soft

Limit field data cache size, by setting its maximum number of entries.
You have to remember that maximum number of settings is per segment, not
per index. To set that, add the following line to the configuration:
index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:
"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }
With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~~~Hankster~~~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

sujoysett · April 27, 2012, 1:07pm

Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after
that your cache was way higher - 3.5gb and 2.4gb. The default settings is
that field data cache is unlimited (in terms of entries). You may want to
do one of the following changes to your Elasticsearch configuration:

Set field data cache type to soft. This will cause this cache to use
Java soft references and thus will enable GC to release memory used by
field data cache, when more heap memory is needed. You can do that by
adding the following line to the configuration:
index.cache.field.type: soft

Limit field data cache size, by setting its maximum number of entries.
You have to remember that maximum number of settings is per segment, not
per index. To set that, add the following line to the configuration:
index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:
"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }
With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~~~Hankster~~~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

jprante · April 27, 2012, 1:58pm

If you submit a facet query on "nouns" or "phrases", ES loads all unique
terms in the requested fields into memory. Refering to the mapping, as can
be seen, these are analyzed fields. As a consequence, ES has to handle with
a vast number of terms in contrast to not_analyzed fields. It also depends
on the application. String terms use lot of memory, Integers would use less.
Because the default ES limit of field cache loading memory is unlimited,
you will hit the ceiling and get OOM when you do not carefully estimate how
much unique string terms you deal with in the faceted fields. You can then
raise the limit if you have still more heap memory available, or, as has
been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:

Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after
that your cache was way higher - 3.5gb and 2.4gb. The default settings is
that field data cache is unlimited (in terms of entries). You may want to
do one of the following changes to your Elasticsearch configuration:

Set field data cache type to soft. This will cause this cache to use
Java soft references and thus will enable GC to release memory used by
field data cache, when more heap memory is needed. You can do that by
adding the following line to the configuration:
index.cache.field.type: soft

Limit field data cache size, by setting its maximum number of
entries. You have to remember that maximum number of settings is per
segment, not per index. To set that, add the following line to the
configuration:
index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will
be good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response.
We have tested the same with our indexes. Following are the observations.
What does it imply and please suggest if we are doing anything wrong in
settings or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:
"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }
With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to
set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster state and health, I didn't find anything like
index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~~~Hankster~~~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

Rafal_Kuc_3 · April 27, 2012, 2:17pm

Hello!

In addition to what Jörg has written I suggested using soft cache
type. Soft type field data cache uses Java soft references in order to
be able to free memory when GC demands that.

You can read about soft references here: http://docs.oracle.com/javase/6/docs/api/java/lang/ref/SoftReference.html

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

If you submit a facet query on "nouns" or "phrases", ES loads all unique terms in the requested fields into memory. Refering to the mapping, as can be seen, these are analyzed fields. As a consequence, ES has to handle with a vast number of terms in contrast to not_analyzed fields. It also depends on the application. String terms use lot of memory, Integers would use less.
Because the default ES limit of field cache loading memory is unlimited, you will hit the ceiling and get OOM when you do not carefully estimate how much unique string terms you deal with in the faceted fields. You can then raise the limit if you have still more heap memory available, or, as has been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:
Hi,

The indexes are working fine now. We are running jmeter testing with multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67] [gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total [1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests
"cache" : { "field_evictions" : 67, "field_size" : "1.7gb", "field_size_in_bytes" : 1853666588, "filter_count" : 0, "filter_evictions" : 0, "filter_size" : "0b", "filter_size_in_bytes" : 0 }

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after that your cache was way higher - 3.5gb and 2.4gb. The default settings is that field data cache is unlimited (in terms of entries). You may want to do one of the following changes to your ElasticSearch configuration:

Set field data cache type to soft. This will cause this cache to use Java soft references and thus will enable GC to release memory used by field data cache, when more heap memory is needed. You can do that by adding the following line to the configuration:
index.cache.field.type: soft
Limit field data cache size, by setting its maximum number of entries. You have to remember that maximum number of settings is per segment, not per index. To set that, add the following line to the configuration:
index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be good for your deployment.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We have tested the same with our indexes. Following are the observations. What does it imply and please suggest if we are doing anything wrong in settings or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After hitting query
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster state and health, I didn't find anything like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett napisał:
Hi,