ElasticSearch 0.19.2 heap space shortage, becoming unresponsive and not recovering or releasing memory

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
ElasticSearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
ElasticSearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above ElasticSearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunately, we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~Blog",
"domain_media": "http://www.bubbleinfo.com
~null~1~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": [],
"findings": []
}

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

My main concern is recovery failure. Heap space error is expected if
are trying to load too many documents in memory but elasticsearch
nodes should recover after this error. I suppose, after this stage
even flush, refresh or optimize will also not work.

Regards

On Apr 27, 3:42 pm, Sujoy Sett sujoys...@gmail.com wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type":

...

read more »

Hello!

Nodes statistics provide information about cache usage. For example run the
following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data
cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query *
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query *
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to
set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster state and health, I didn't find anything
like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

Hello!

Before hitting ES with query you had empty field data cache and after that your cache was way higher - 3.5gb and 2.4gb. The default settings is that field data cache is unlimited (in terms of entries). You may want to do one of the following changes to your ElasticSearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use Java soft references and thus will enable GC to release memory used by field data cache, when more heap memory is needed. You can do that by adding the following line to the configuration:

index.cache.field.type: soft

  1. Limit field data cache size, by setting its maximum number of entries. You have to remember that maximum number of settings is per segment, not per index. To set that, add the following line to the configuration:

index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be good for your deployment.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed

java.lang.OutOfMemoryError: loading field [phrases] caused out of memory failure

along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We really appreciate and are thankful to you for your prompt response. We have tested the same with our indexes. Following are the observations. What does it imply and please suggest if we are doing anything wrong in settings or elsewhere.

Initial State

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "503.1mb",


      "size_in_bytes" : 527622079


    },


    "docs" : {


      "count" : 74250,


      "deleted" : 2705


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 0,


      "query_time" : "0s",


      "query_time_in_millis" : 0,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 0,


      "query_time" : "0s",


      "query_time_in_millis" : 0,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

}

After hitting query

{

"query" : {


    "match_all" : {  }


},


"size" : 0,


"facets" : {


    "tag" : {


        "terms" : {


            "field" : "phrases",


            "size" : 100


        },


        "_cache":false


    }


}

}

After single request

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "6.3gb",


      "size_in_bytes" : 6787402724


    },


    "docs" : {


      "count" : 876639,


      "deleted" : 56407


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 2,


      "query_time" : "21.8s",


      "query_time_in_millis" : 21869,


      "query_current" : 4,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "3.5gb",


      "field_size_in_bytes" : 3834410088,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 4,


      "query_time" : "21.8s",


      "query_time_in_millis" : 21808,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "2.4gb",


      "field_size_in_bytes" : 2653970178,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

}

After two requests

{

"cluster_name" : "elasticsearch_local_0_19",

"nodes" : {

"zM7byv_qT7CbTNJprWCl5g" : {


  "name" : "es_node_102",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.177.102:9300">172.29.177.102:9300</a>]",


  "hostname" : "01hw445748",


  "attributes" : {


    "tag" : "es_node_102"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 11,


      "query_time" : "1.9m",


      "query_time_in_millis" : 116142,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "4.9gb",


      "field_size_in_bytes" : 5323063782,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


},


"qpvNNHpcQ3i1Bz8BWvq4oA" : {


  "name" : "es_node_67",


  "transport_address" : "inet[/<a style=" font-family:'courier new'; font-size: 9pt;" href="http://172.29.181.67:9300">172.29.181.67:9300</a>]",


  "hostname" : "01hw400248",


  "attributes" : {


    "tag" : "es_node_67"


  },


  "indices" : {


    "store" : {


      "size" : "8gb",


      "size_in_bytes" : 8615814550


    },


    "docs" : {


      "count" : 1121886,


      "deleted" : 65007


    },


    "indexing" : {


      "index_total" : 0,


      "index_time" : "0s",


      "index_time_in_millis" : 0,


      "index_current" : 0,


      "delete_total" : 0,


      "delete_time" : "0s",


      "delete_time_in_millis" : 0,


      "delete_current" : 0


    },


    "get" : {


      "total" : 0,


      "time" : "0s",


      "time_in_millis" : 0,


      "exists_total" : 0,


      "exists_time" : "0s",


      "exists_time_in_millis" : 0,


      "missing_total" : 0,


      "missing_time" : "0s",


      "missing_time_in_millis" : 0,


      "current" : 0


    },


    "search" : {


      "query_total" : 9,


      "query_time" : "49.6s",


      "query_time_in_millis" : 49662,


      "query_current" : 0,


      "fetch_total" : 0,


      "fetch_time" : "0s",


      "fetch_time_in_millis" : 0,


      "fetch_current" : 0


    },


    "cache" : {


      "field_evictions" : 0,


      "field_size" : "4.2gb",


      "field_size_in_bytes" : 4587853968,


      "filter_count" : 0,


      "filter_evictions" : 0,


      "filter_size" : "0b",


      "filter_size_in_bytes" : 0


    },


    "merges" : {


      "current" : 0,


      "current_docs" : 0,


      "current_size" : "0b",


      "current_size_in_bytes" : 0,


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0,


      "total_docs" : 0,


      "total_size" : "0b",


      "total_size_in_bytes" : 0


    },


    "refresh" : {


      "total" : 171,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    },


    "flush" : {


      "total" : 0,


      "total_time" : "0s",


      "total_time_in_millis" : 0


    }


  }


}

}

}

After three requests

ES down with heap space error.

No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Nodes statistics provide information about cache usage. For example run the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data cache, something like the following:

"cache" : {


      "field_evictions" : 0,


      "field_size" : "0b",


      "field_size_in_bytes" : 0,


      "filter_count" : 1,


      "filter_evictions" : 0,


      "filter_size" : "32b",


      "filter_size_in_bytes" : 32


    }

With it you should be able to see how much memory your field data cache consumes.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett napisał:

Hi,

Can u please explain how to check the field data cache ? Do I have to set anything to monitor explicitly?

I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster state and health, I didn't find anything like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the example query ?

Regards,

Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data

from social media blogs and forums. The data volume is going up to

500000 documents per index, and size of this volume of data in

Elasticsearch index is going up to 3 GB per index per node (all

shards). We always maintain the number of replicas 1 less than the

total number of nodes to ensure that a copy of all shards should

reside on every node at any instant. The number of shards are

generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization

purpose, and mainly facets for showing trend charts or keyword clouds.

Following are some example of the query we execute:

{

"query" : { 


    "match_all" : {  } 


}, 


"size" : 0, 


"facets" : { 


    "tag" : { 


        "terms" : { 


            "field" : "nouns", 


            "size" : 100 


        }, 


        "_cache":false 


    } 


} 

}

{

"query" : { 


    "match_all" : {  } 


}, 


"size" : 0, 


"facets" : { 


    "tag" : { 


        "terms" : { 


            "field" : "phrases", 


            "size" : 100 


        }, 


        "_cache":false 


    } 


} 

}

While executing such queries we often encounter heap space shortage,

and the nodes becomes unresponsive. Our main concern is that the nodes

do not recover to normal state even after dumping the heap to a hprof

file. The node still consumes the maximum allocated memory as shown in

task manager java.exe process, and the nodes remain unresponsive until

we manually kill and restart them.

ES Configuration 1:

ElasticSearch Version 0.19.2

2 Nodes, one on each physical server

Max heap size 6GB per node.

10 shards, 1 replica.

ES Configuration 2:

ElasticSearch Version 0.19.2

6 Nodes, three on each physical server

Max heap size 2GB per node.

10 shards, 5 replica.

Server Configuration:

Windows 7 64 bit

64 bit JVM

8 GB pysical memory

Dual Core processor

For both the configuration mentioned above ElasticSearch was unable to

respond to the facet queries mentioned above, it was also unable to

recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request

you to please suggest a better configuration or a different approach

if required.

The mapping of the data is we use is as follows:

(keyword1 is a customized keyword analyzer, similarly standard1 is a

customized standard analyzer)

{

        "properties": { 


            "adjectives": { 


                "type": "string", 


                "analyzer": "stop2" 


            }, 


            "alertStatus": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedByUserId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedByUserName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedToDepartmentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedToDepartmentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "assignedToUserId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "assignedToUserName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "authorJsonMetadata": { 


                "properties": { 


                    "favourites": { 


                        "type": "string" 


                    }, 


                    "followers": { 


                        "type": "string" 


                    }, 


                    "following": { 


                        "type": "string" 


                    }, 


                    "likes": { 


                        "type": "string" 


                    }, 


                    "listed": { 


                        "type": "string" 


                    }, 


                    "subscribers": { 


                        "type": "string" 


                    }, 


                    "subscription": { 


                        "type": "string" 


                    }, 


                    "uploads": { 


                        "type": "string" 


                    }, 


                    "views": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "authorKloutDetails": { 


                "dynamic": "true", 


                "properties": { 


                    "amplificationScore": { 


                        "type": "string" 


                    }, 


                    "authorKloutDetailsFound": { 


                        "type": "string" 


                    }, 


                    "description": { 


                        "type": "string" 


                    }, 


                    "influencees": { 


                        "dynamic": "true", 


                        "properties": { 


                            "kscore": { 


                                "type": "string" 


                            }, 


                            "twitter_screen_name": { 


                                "type": "string" 


                            } 


                        } 


                    }, 


                    "influencers": { 


                        "dynamic": "true", 


                        "properties": { 


                            "kscore": { 


                                "type": "string" 


                            }, 


                            "twitter_screen_name": { 


                                "type": "string" 


                            } 


                        } 


                    }, 


                    "kloutClass": { 


                        "type": "string" 


                    }, 


                    "kloutClassDescription": { 


                        "type": "string" 


                    }, 


                    "kloutScore": { 


                        "type": "string" 


                    }, 


                    "kloutScoreDescription": { 


                        "type": "string" 


                    }, 


                    "kloutTopic": { 


                        "type": "string" 


                    }, 


                    "slope": { 


                        "type": "string" 


                    }, 


                    "trueReach": { 


                        "type": "string" 


                    }, 


                    "twitterId": { 


                        "type": "string" 


                    }, 


                    "twitterScreenName": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "author_media": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "brandTerms": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "calculatedSentimentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "calculatedSentimentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "categories": { 


                "properties": { 


                    "category": { 


                        "type": "string", 


                        "analyzer": "keyword1" 


                    }, 


                    "categoryWords": { 


                        "type": "string", 


                        "analyzer": "keyword1" 


                    }, 


                    "score": { 


                        "type": "double" 


                    } 


                } 


            }, 


            "commentCount": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentAuthorId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentAuthorName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentJsonMetadata": { 


                "properties": { 


                    "comment Count": { 


                        "type": "string" 


                    }, 


                    "dislikes": { 


                        "type": "string" 


                    }, 


                    "favourites": { 


                        "type": "string" 


                    }, 


                    "likes": { 


                        "type": "string" 


                    }, 


                    "retweet Count": { 


                        "type": "string" 


                    }, 


                    "views": { 


                        "type": "string" 


                    } 


                } 


            }, 


            "contentPublishedTime": { 


                "type": "date", 


                "index": "analyzed", 


                "format": "dateOptionalTime" 


            }, 


            "contentTextFull": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentTextFullHighlighted": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentTextSnippetHighlighted": { 


                "type": "string", 


                "analyzer": "standard1" 


            }, 


            "contentType": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentUrlId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "contentUrlPath": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "contentUrlPublishedTime": { 


                "type": "date", 


                "index": "analyzed", 


                "format": "dateOptionalTime" 


            }, 


            "ctmId": { 


                "type": "long" 


            }, 


            "domainName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "domainUrl": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "domain_media": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "findings": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "geographyId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "geographyName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "kloutScore": { 


                "type": "object" 


            }, 


            "languageId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "languageName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "listListeningObjectiveName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceIconPath": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "mediaSourceName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "mediaSourceTypeId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "mediaSourceTypeName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "notesCount": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "nouns": { 


                "type": "string", 


                "analyzer": "stop2" 


            }, 


            "opinionWords": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "phrases": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "profileId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "profileName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "topicId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "topicName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "userSentimentId": { 


                "type": "integer", 


                "index": "analyzed" 


            }, 


            "userSentimentName": { 


                "type": "string", 


                "analyzer": "keyword1" 


            }, 


            "verbs": { 


                "type": "string", 


                "analyzer": "stop2" 


            } 


        } 


    } 

A sample of the structure of the data is as follows:

{

"contentType": "comment", 


"topicId": 9, 


"mediaSourceId": 3, 


"contentId": 34834, 


"ctmId": 73322, 


"contentTextFull": "The low numbers nationally published by 

Corelogic were a result of banks holding off foreclosures until

settlement. \nAs Bloomberg and RealtyTrac stated. this will result in

more foreclosure pain in the short term as some of the foreclosures

that should have happened last year instead happen this year which

will likely result in higher foreclosure numbers in 2012 than

2011.\nThe estimates from Realtytrac and Zillow are hovering around 1

million completed foreclosures, or REOs, in 2012, a 25 percent

increase from 2011. \nThe positive is that the data suggests that

short sales net the banks more money so they should be expected to

increase\nThe bottom line is that in the longer term the bank

settlement will help to more quickly clear the so-called shadow

inventory, which will in turn help the housing market finally bottom

out once and for all. \nMy buddy who bought in Santa Luz in 2006 is

asked every month by his bank when he makes his payment on his $1.2mm

underwater home, do you plan on staying in the house? . Per

Corelogic, there are still large numbers still underwater in SD\n-

3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is

we only have one last market to get hit, and expect the high end.

The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/

business/ci_19899224</a>nUnfortunately, we can not avoid the headwinds.",

"contentTextFullHighlighted": null, 


"contentTextSnippetHighlighted": "The low numbers nationally 

published by Corelogic were a result of banks holding off foreclosures

until settlement. \nAs Bloomberg and RealtyTrac stated. this will

result in more foreclosure pain in the short term as some of the

foreclosures that should have happened last year instead happen...",

"contentJsonMetadata": null, 


"commentCount": 117, 


"contentUrlId": 13535, 


"contentUrlPath": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/">http://www.bubbleinfo.com/</a><a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/">2012/02/09/mortgage- </a>

settlement-renegade/",

"domainUrl": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com">http://www.bubbleinfo.com</a>", 


"domainName": null, 


"contentAuthorId": 15614, 


"contentAuthorName": "Hankster", 


"authorJsonMetadata": null, 


"authorKloutDetails": null, 


"mediaSourceName": "Board Reader Blog", 


"mediaSourceIconPath": "BoardReaderBlog.gif", 


"mediaSourceTypeId": 1, 


"mediaSourceTypeName": "Blog", 


"geographyId": 0, 


"geographyName": "Unknown", 


"languageId": 1, 


"languageName": "English", 


"topicName": "Bank of America", 


"profileId": 3, 


"profileName": "USAA_Competition1", 


"contentPublishedTime": 1328798840000, 


"contentUrlPublishedTime": 1329336423000, 


"calculatedSentimentId": 4, 


"calculatedSentimentName": "POS", 


"userSentimentId": 0, 


"userSentimentName": null, 


"listListeningObjectiveName": [ 


    "Untagged LO" 


], 


"alertStatus": "assigned", 


"assignedToUserId": 2, 


"assignedToUserName": null, 


"assignedByUserId": 1, 


"assignedByUserName": null, 


"assignedToDepartmentId": 0, 


"assignedToDepartmentName": null, 


"notesCount": 0, 


"nouns": [ 


    "bank", 


    "banks", 


    "Bloomberg", 


    "buddy", 


    "Corelogic", 


    "data", 


    "estimates", 


    "foreclosure", 


    "foreclosures", 


    "headwinds", 


    "home", 


    "house", 


    "housing", 


    "increase", 


    "inventory", 


    "line", 


    "Luz", 


    "market", 


    "mm", 


    "money", 


    "month", 


    "net", 


    "news", 


    "numbers", 


    "pain", 


    "payment", 


    "percent", 


    "Realtytrac", 


    "RealtyTrac", 


    "REOs", 


    "result", 


    "sales", 


    "Santa", 


    "SD", 


    "settlement", 


    "shadow", 


    "term", 


    "turn", 


    "year", 


    "Zillow" 


], 


"verbs": [ 


    "asked", 


    "avoid", 


    "bought", 


    "completed", 


    "expect", 


    "expected", 


    "get", 


    "happen", 


    "happened", 


    "help", 


    "hit", 


    "holding", 


    "hovering", 


    "increase", 


    "makes", 


    "plan", 


    "published", 


    "result", 


    "stated", 


    "staying", 


    "suggests" 


], 


"adjectives": [ 


    "bottom", 


    "clear", 


    "finally", 


    "good", 


    "high", 


    "higher", 


    "instead", 


    "large", 


    "last", 


    "likely", 


    "longer", 


    "low", 


    "nationally", 


    "next", 


    "not", 


    "positive", 


    "quickly", 


    "short", 


    "so-called", 


    "underwater", 


    "Unfortunately" 


], 


"phrases": [ 


    "2012 than 2011", 


    "25 percent", 


    "25 percent increase", 


    "2700 underwater in 92130", 


    "3800 underwater in 92127", 


    "92130 The good news", 


    "asked every month", 


    "avoid the headwinds", 


    "bank settlement", 


    "banks holding off foreclosures", 


    "banks more money", 


    "Bloomberg and RealtyTrac", 


    "bottom line", 


    "bought in Santa", 


    "bought in Santa Luz", 


    "clear the so-called shadow", 


    "completed foreclosures", 


    "estimates from Realtytrac", 


    "foreclosure numbers", 


    "foreclosure numbers in 2012", 


    "foreclosure pain", 


    "foreclosures until settlement", 


    "good news", 


    "happen this year", 


    "happen this year --", 


    "happened last year", 


    "help the housing", 


    "help the housing market", 


    "higher foreclosure", 


    "higher foreclosure numbers", 


    "holding off foreclosures", 


    "housing market", 


    "increase from 2011", 


    "increase The bottom line", 


    "instead happen this year", 


    "large numbers", 


    "last market", 


    "last year", 


    "longer term", 


    "longer term the bank", 


    "low numbers", 


    "Luz in 2006", 


    "makes his payment", 


    "million completed foreclosures", 


    "mm underwater home", 


    "month by his bank", 


    "nationally published by Corelogic", 


    "net the banks", 


    "not avoid the headwinds", 


    "numbers in 2012", 


    "percent increase", 


    "percent increase from 2011", 


    "published by Corelogic", 


    "Realtytrac and Zillow", 


    "result in higher foreclosure", 


    "result in more foreclosure", 


    "result of banks", 


    "sales net", 


    "sales net the banks", 


    "Santa Luz", 


    "Santa Luz in 2006", 


    "shadow inventory", 


    "short sales", 


    "short sales net", 


    "short term", 


    "so-called shadow", 


    "so-called shadow inventory", 


    "staying in the house", 


    "suggests that short sales", 


    "term the bank", 


    "term the bank settlement", 


    "turn help the housing", 


    "underwater home", 


    "underwater in 92127", 


    "underwater in 92130", 


    "underwater in SD", 


    "year --" 


], 


"author_media": "15614~~~Hankster~~~1~~~Blog", 


"domain_media": "<a style=" font-family:'courier new'; font-size: 9pt;" href="http://www.bubbleinfo.com">http://www.bubbleinfo.com</a>~~~null~~~1~~~Blog", 


"categories": [ 


    { 


        "category": "post closing", 


        "categoryWords": [ 


            "foreclosure", 


            "foreclosure" 


        ], 


        "score": "2.0" 


    }, 


    { 


        "category": "pre buy research", 


        "categoryWords": [ 


            "term", 


            "term" 


        ], 


        "score": "2.0" 


    } 


], 


"opinionWords": [ 


    "positive", 


    "good news", 


    "expect", 


    "unfortunately" 


], 


"brandTerms": [], 


"findings": [] 

}

Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Before hitting ES with query you had empty field data cache and after that
your cache was way higher - 3.5gb and 2.4gb. The default settings is that
field data cache is unlimited (in terms of entries). You may want to do one
of the following changes to your Elasticsearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use
    Java soft references and thus will enable GC to release memory used by
    field data cache, when more heap memory is needed. You can do that by
    adding the following line to the configuration:
    index.cache.field.type: soft

  2. Limit field data cache size, by setting its maximum number of entries.
    You have to remember that maximum number of settings is per segment, not
    per index. To set that, add the following line to the configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Before hitting ES with query you had empty field data cache and after
that your cache was way higher - 3.5gb and 2.4gb. The default settings is
that field data cache is unlimited (in terms of entries). You may want to
do one of the following changes to your Elasticsearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use
    Java soft references and thus will enable GC to release memory used by
    field data cache, when more heap memory is needed. You can do that by
    adding the following line to the configuration:
    index.cache.field.type: soft

  2. Limit field data cache size, by setting its maximum number of entries.
    You have to remember that maximum number of settings is per segment, not
    per index. To set that, add the following line to the configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

If you submit a facet query on "nouns" or "phrases", ES loads all unique
terms in the requested fields into memory. Refering to the mapping, as can
be seen, these are analyzed fields. As a consequence, ES has to handle with
a vast number of terms in contrast to not_analyzed fields. It also depends
on the application. String terms use lot of memory, Integers would use less.
Because the default ES limit of field cache loading memory is unlimited,
you will hit the ceiling and get OOM when you do not carefully estimate how
much unique string terms you deal with in the faceted fields. You can then
raise the limit if you have still more heap memory available, or, as has
been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:

Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:

Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests

"cache" : {
"field_evictions" : 67,
"field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
}

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:

Hello!

Before hitting ES with query you had empty field data cache and after
that your cache was way higher - 3.5gb and 2.4gb. The default settings is
that field data cache is unlimited (in terms of entries). You may want to
do one of the following changes to your Elasticsearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use
    Java soft references and thus will enable GC to release memory used by
    field data cache, when more heap memory is needed. You can do that by
    adding the following line to the configuration:
    index.cache.field.type: soft

  2. Limit field data cache size, by setting its maximum number of
    entries. You have to remember that maximum number of settings is per
    segment, not per index. To set that, add the following line to the
    configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will
be good for your deployment.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response.
We have tested the same with our indexes. Following are the observations.
What does it imply and please suggest if we are doing anything wrong in
settings or elsewhere.

*Initial State
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After hitting query
*{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

*After single request
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After two requests
*{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

*After three requests
ES down with heap space error.
No response.

*Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to
set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster state and health, I didn't find anything like
index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.http://www.mercurynews.com/business/ci_19899224\nUnfortunately
com/ http://www.mercurynews.com/business/ci_19899224\nUnfortunately
business/ci_19899224<Foreclosures at the high end increase across the Bay Area – The Mercury News>
nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
2012/02/09/mortgage- http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/
",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

Hello!

In addition to what Jörg has written I suggested using soft cache
type. Soft type field data cache uses Java soft references in order to
be able to free memory when GC demands that.

You can read about soft references here: http://docs.oracle.com/javase/6/docs/api/java/lang/ref/SoftReference.html

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

If you submit a facet query on "nouns" or "phrases", ES loads all unique terms in the requested fields into memory. Refering to the mapping, as can be seen, these are analyzed fields. As a consequence, ES has to handle with a vast number of terms in contrast to not_analyzed fields. It also depends on the application. String terms use lot of memory, Integers would use less.
Because the default ES limit of field cache loading memory is unlimited, you will hit the ceiling and get OOM when you do not carefully estimate how much unique string terms you deal with in the faceted fields. You can then raise the limit if you have still more heap memory available, or, as has been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:
Hi,

The indexes are working fine now. We are running jmeter testing with multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67] [gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total [1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests
"cache" : { "field_evictions" : 67, "field_size" : "1.7gb", "field_size_in_bytes" : 1853666588, "filter_count" : 0, "filter_evictions" : 0, "filter_size" : "0b", "filter_size_in_bytes" : 0 }

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after that your cache was way higher - 3.5gb and 2.4gb. The default settings is that field data cache is unlimited (in terms of entries). You may want to do one of the following changes to your ElasticSearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use Java soft references and thus will enable GC to release memory used by field data cache, when more heap memory is needed. You can do that by adding the following line to the configuration:
    index.cache.field.type: soft

  2. Limit field data cache size, by setting its maximum number of entries. You have to remember that maximum number of settings is per segment, not per index. To set that, add the following line to the configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be good for your deployment.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We have tested the same with our indexes. Following are the observations. What does it imply and please suggest if we are doing anything wrong in settings or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After hitting query
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field data cache, something like the following:

"cache" : {
      "field_evictions" : 0,
      "field_size" : "0b",
      "field_size_in_bytes" : 0,
      "filter_count" : 1,
      "filter_evictions" : 0,
      "filter_size" : "32b",
      "filter_size_in_bytes" : 32
    }

With it you should be able to see how much memory your field data cache consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster state and health, I didn't find anything like index.cache.field.max_size there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
ElasticSearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
ElasticSearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above ElasticSearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunately, we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~Blog",
"domain_media": "http://www.bubbleinfo.com
~null~1~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": [],
"findings": []
}

Hi,

We know that facet queries on a Keyword analyzed String Array field takes a
lot of memory.
But that is specifically what we want, for displaying Tag-Clouds or
Keyword-Clouds on the fly, and applying filters and drill-down capabilities
on them dynamically.
We are currently trying to establish the reasonable cache limit as
suggested by Jörg.

We have applied the soft type field, on suggestion of Rafał, and after that
the indexes freeing cache when required.

Thanks,

On Friday, April 27, 2012 7:47:00 PM UTC+5:30, Rafał Kuć wrote:

Hello!

In addition to what Jörg has written I suggested using soft cache
type. Soft type field data cache uses Java soft references in order to
be able to free memory when GC demands that.

You can read about soft references here:
SoftReference (Java Platform SE 6)

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

If you submit a facet query on "nouns" or "phrases", ES loads all unique
terms in the requested fields into memory. Refering to the mapping, as can
be seen, these are analyzed fields. As a consequence, ES has to handle with
a vast number of terms in contrast to not_analyzed fields. It also depends
on the application. String terms use lot of memory, Integers would use
less.
Because the default ES limit of field cache loading memory is unlimited,
you will hit the ceiling and get OOM when you do not carefully estimate how
much unique string terms you deal with in the faceted fields. You can then
raise the limit if you have still more heap memory available, or, as has
been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:
Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests
"cache" : { "field_evictions" : 67, "field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588, "filter_count" : 0, "filter_evictions"
: 0, "filter_size" : "0b", "filter_size_in_bytes" : 0 }

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after that
your cache was way higher - 3.5gb and 2.4gb. The default settings is that
field data cache is unlimited (in terms of entries). You may want to do one
of the following changes to your Elasticsearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use
    Java soft references and thus will enable GC to release memory used by
    field data cache, when more heap memory is needed. You can do that by
    adding the following line to the configuration:
    index.cache.field.type: soft

  2. Limit field data cache size, by setting its maximum number of entries.
    You have to remember that maximum number of settings is per segment, not
    per index. To set that, add the following line to the configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After hitting query
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

After single request
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "6.3gb",
"size_in_bytes" : 6787402724
},
"docs" : {
"count" : 876639,
"deleted" : 56407
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 2,
"query_time" : "21.8s",
"query_time_in_millis" : 21869,
"query_current" : 4,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "3.5gb",
"field_size_in_bytes" : 3834410088,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 4,
"query_time" : "21.8s",
"query_time_in_millis" : 21808,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "2.4gb",
"field_size_in_bytes" : 2653970178,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After two requests
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 11,
"query_time" : "1.9m",
"query_time_in_millis" : 116142,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.9gb",
"field_size_in_bytes" : 5323063782,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 9,
"query_time" : "49.6s",
"query_time_in_millis" : 49662,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "4.2gb",
"field_size_in_bytes" : 4587853968,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}
}
}

After three requests
ES down with heap space error.
No response.

Thanks and Regards,

On Friday, April 27, 2012 4:20:59 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Nodes statistics provide information about cache usage. For example run
the following command:

curl 'localhost:9200/_cluster/nodes/stats?pretty=true'

In the output you should find the statistics for both filter and field
data cache, something like the following:

"cache" : { 
      "field_evictions" : 0, 
      "field_size" : "0b", 
      "field_size_in_bytes" : 0, 
      "filter_count" : 1, 
      "filter_evictions" : 0, 
      "filter_size" : "32b", 
      "filter_size_in_bytes" : 32 
    } 

With it you should be able to see how much memory your field data cache
consumes.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

W dniu piątek, 27 kwietnia 2012 12:42:35 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:
Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:
Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunately, we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/",
"domainUrl": "http://www.bubbleinfo.com",
"domainName": null,
"contentAuthorId": 15614,
"contentAuthorName": "Hankster",
"authorJsonMetadata": null,
"authorKloutDetails": null,
"mediaSourceName": "Board Reader Blog",
"mediaSourceIconPath": "BoardReaderBlog.gif",
"mediaSourceTypeId": 1,
"mediaSourceTypeName": "Blog",
"geographyId": 0,
"geographyName": "Unknown",
"languageId": 1,
"languageName": "English",
"topicName": "Bank of America",
"profileId": 3,
"profileName": "USAA_Competition1",
"contentPublishedTime": 1328798840000,
"contentUrlPublishedTime": 1329336423000,
"calculatedSentimentId": 4,
"calculatedSentimentName": "POS",
"userSentimentId": 0,
"userSentimentName": null,
"listListeningObjectiveName": [
"Untagged LO"
],
"alertStatus": "assigned",
"assignedToUserId": 2,
"assignedToUserName": null,
"assignedByUserId": 1,
"assignedByUserName": null,
"assignedToDepartmentId": 0,
"assignedToDepartmentName": null,
"notesCount": 0,
"nouns": [
"bank",
"banks",
"Bloomberg",
"buddy",
"Corelogic",
"data",
"estimates",
"foreclosure",
"foreclosures",
"headwinds",
"home",
"house",
"housing",
"increase",
"inventory",
"line",
"Luz",
"market",
"mm",
"money",
"month",
"net",
"news",
"numbers",
"pain",
"payment",
"percent",
"Realtytrac",
"RealtyTrac",
"REOs",
"result",
"sales",
"Santa",
"SD",
"settlement",
"shadow",
"term",
"turn",
"year",
"Zillow"
],
"verbs": [
"asked",
"avoid",
"bought",
"completed",
"expect",
"expected",
"get",
"happen",
"happened",
"help",
"hit",
"holding",
"hovering",
"increase",
"makes",
"plan",
"published",
"result",
"stated",
"staying",
"suggests"
],
"adjectives": [
"bottom",
"clear",
"finally",
"good",
"high",
"higher",
"instead",
"large",
"last",
"likely",
"longer",
"low",
"nationally",
"next",
"not",
"positive",
"quickly",
"short",
"so-called",
"underwater",
"Unfortunately"
],
"phrases": [
"2012 than 2011",
"25 percent",
"25 percent increase",
"2700 underwater in 92130",
"3800 underwater in 92127",
"92130 The good news",
"asked every month",
"avoid the headwinds",
"bank settlement",
"banks holding off foreclosures",
"banks more money",
"Bloomberg and RealtyTrac",
"bottom line",
"bought in Santa",
"bought in Santa Luz",
"clear the so-called shadow",
"completed foreclosures",
"estimates from Realtytrac",
"foreclosure numbers",
"foreclosure numbers in 2012",
"foreclosure pain",
"foreclosures until settlement",
"good news",
"happen this year",
"happen this year --",
"happened last year",
"help the housing",
"help the housing market",
"higher foreclosure",
"higher foreclosure numbers",
"holding off foreclosures",
"housing market",
"increase from 2011",
"increase The bottom line",
"instead happen this year",
"large numbers",
"last market",
"last year",
"longer term",
"longer term the bank",
"low numbers",
"Luz in 2006",
"makes his payment",
"million completed foreclosures",
"mm underwater home",
"month by his bank",
"nationally published by Corelogic",
"net the banks",
"not avoid the headwinds",
"numbers in 2012",
"percent increase",
"percent increase from 2011",
"published by Corelogic",
"Realtytrac and Zillow",
"result in higher foreclosure",
"result in more foreclosure",
"result of banks",
"sales net",
"sales net the banks",
"Santa Luz",
"Santa Luz in 2006",
"shadow inventory",
"short sales",
"short sales net",
"short term",
"so-called shadow",
"so-called shadow inventory",
"staying in the house",
"suggests that short sales",
"term the bank",
"term the bank settlement",
"turn help the housing",
"underwater home",
"underwater in 92127",
"underwater in 92130",
"underwater in SD",
"year --"
],
"author_media": "15614~Hankster~1~~~Blog",
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog",
"categories": [
{
"category": "post closing",
"categoryWords": [
"foreclosure",
"foreclosure"
],
"score": "2.0"
},
{
"category": "pre buy research",
"categoryWords": [
"term",
"term"
],
"score": "2.0"
}
],
"opinionWords": [
"positive",
"good news",
"expect",
"unfortunately"
],
"brandTerms": ,
"findings":
}

Hi,

In this case we have observed, "dumping the heap to a hprof
file" takes too much time because it tries to dump some 6GB data in
this file.

Is there any way to stop dumping data in this file and straight away
release the memory once we hit the heap space limit.

Thanks & Regards

On Apr 27, 7:58 pm, Sujoy Sett sujoys...@gmail.com wrote:

Hi,

We know that facet queries on a Keyword analyzed String Array field takes a
lot of memory.
But that is specifically what we want, for displaying Tag-Clouds or
Keyword-Clouds on the fly, and applying filters and drill-down capabilities
on them dynamically.
We are currently trying to establish the reasonable cache limit as
suggested by Jörg.

We have applied the soft type field, on suggestion of Rafał, and after that
the indexes freeing cache when required.

Thanks,

On Friday, April 27, 2012 7:47:00 PM UTC+5:30, Rafał Kuć wrote:

Hello!

In addition to what Jörg has written I suggested using soft cache
type. Soft type field data cache uses Java soft references in order to
be able to free memory when GC demands that.

You can read about soft references here:
JDK 21 Documentation - Home....

--
Regards,
Rafał Kuć
Sematext ::http://sematext.com/:: Solr - Lucene - Nutch

If you submit a facet query on "nouns" or "phrases", ES loads all unique
terms in the requested fields into memory. Refering to the mapping, as can
be seen, these are analyzed fields. As a consequence, ES has to handle with
a vast number of terms in contrast to not_analyzed fields. It also depends
on the application. String terms use lot of memory, Integers would use
less.
Because the default ES limit of field cache loading memory is unlimited,
you will hit the ceiling and get OOM when you do not carefully estimate how
much unique string terms you deal with in the faceted fields. You can then
raise the limit if you have still more heap memory available, or, as has
been suggested, you can establish a reasonable cache limit to avoid OOM.

Jörg

On Friday, April 27, 2012 3:07:39 PM UTC+2, Sujoy Sett wrote:
Hi,

The indexes are working fine now. We are running jmeter testing with
multiple uses.
We see the following in the prompt

[2012-04-27 18:28:27,181][WARN ][monitor.jvm ] [es_node_67]
[gc][ParNew][4142][305] duration [1.4s], collections [1]/[4.3s], total
[1.4s]/[21.8s],memory [5.7gb]->[5.7gb]/[5.9gb]

Just out of inquisitiveness, what is ES doing internally? And please can
you explain the settings you suggested in more details?
Specially how segments and shards are related?

Thanks and Regards,

On Friday, April 27, 2012 5:30:46 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We ran ES with settings

index.cache.field.type: soft
index.cache.field.max_size: 1000

And ES cache is showing following results on subsequent requests
"cache" : { "field_evictions" : 67, "field_size" : "1.7gb",
"field_size_in_bytes" : 1853666588, "filter_count" : 0, "filter_evictions"
: 0, "filter_size" : "0b", "filter_size_in_bytes" : 0 }

We see that field_size is coming down after hitting the peak.
We are running more tests, will update soon. Thanks for your help.

Regards,
On Friday, April 27, 2012 5:02:17 PM UTC+5:30, Rafał Kuć wrote:
Hello!

Before hitting ES with query you had empty field data cache and after that
your cache was way higher - 3.5gb and 2.4gb. The default settings is that
field data cache is unlimited (in terms of entries). You may want to do one
of the following changes to your Elasticsearch configuration:

  1. Set field data cache type to soft. This will cause this cache to use
    Java soft references and thus will enable GC to release memory used by
    field data cache, when more heap memory is needed. You can do that by
    adding the following line to the configuration:
    index.cache.field.type: soft
  1. Limit field data cache size, by setting its maximum number of entries.
    You have to remember that maximum number of settings is per segment, not
    per index. To set that, add the following line to the configuration:
    index.cache.field.max_size: 10000

Treat the above value as an example, I can't predict what setting will be
good for your deployment.

--
Regards,
Rafał Kuć
Sematext ::http://sematext.com/:: Solr - Lucene - Nutch

Also

following message has been printed
java.lang.OutOfMemoryError: loading field [phrases] caused out of memory
failure
along with lots of stack traces in the ES prompt.

Any help from that?

Thanks and regards,

On Friday, April 27, 2012 4:44:19 PM UTC+5:30, Sujoy Sett wrote:
Hi,

We really appreciate and are thankful to you for your prompt response. We
have tested the same with our indexes. Following are the observations. What
does it imply and please suggest if we are doing anything wrong in settings
or elsewhere.

Initial State
{
"cluster_name" : "elasticsearch_local_0_19",
"nodes" : {
"zM7byv_qT7CbTNJprWCl5g" : {
"name" : "es_node_102",
"transport_address" : "inet[/172.29.177.102:9300]",
"hostname" : "01hw445748",
"attributes" : {
"tag" : "es_node_102"
},
"indices" : {
"store" : {
"size" : "503.1mb",
"size_in_bytes" : 527622079
},
"docs" : {
"count" : 74250,
"deleted" : 2705
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
},
"qpvNNHpcQ3i1Bz8BWvq4oA" : {
"name" : "es_node_67",
"transport_address" : "inet[/172.29.181.67:9300]",
"hostname" : "01hw400248",
"attributes" : {
"tag" : "es_node_67"
},
"indices" : {
"store" : {
"size" : "8gb",
"size_in_bytes" : 8615814550
},
"docs" : {
"count" : 1121886,
"deleted" : 65007
},
"indexing" : {
"index_total" : 0,
"index_time" : "0s",
"index_time_in_millis" : 0,
"index_current" : 0,
"delete_total" : 0,
"delete_time" : "0s",
"delete_time_in_millis" : 0,
"delete_current" : 0
},
"get" : {
"total" : 0,
"time" : "0s",
"time_in_millis" : 0,
"exists_total" : 0,
"exists_time" : "0s",
"exists_time_in_millis" : 0,
"missing_total" : 0,
"missing_time" : "0s",
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"query_total" : 0,
"query_time" : "0s",
"query_time_in_millis" : 0,
"query_current" : 0,
"fetch_total" : 0,
"fetch_time" : "0s",
"fetch_time_in_millis" : 0,
"fetch_current" : 0
},
"cache" : {
"field_evictions" : 0,
"field_size" : "0b",
"field_size_in_bytes" : 0,
"filter_count" : 0,
"filter_evictions" : 0,
"filter_size" : "0b",
"filter_size_in_bytes" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size" : "0b",
"current_size_in_bytes" : 0,
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0,
"total_docs" : 0,
"total_size" : "0b",
"total_size_in_bytes" : 0
},
"refresh" : {
"total" : 171,
"total_time" : "0s",
"total_time_in_millis" : 0
},
"flush" : {
"total" : 0,
"total_time" : "0s",
"total_time_in_millis" : 0
}
}
}

...

read more »

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in SPM
for ES, we can see how the cache size changes when we run queries vs. when
we use sort vs. when we facet on field X or X and Y, etc. And we can see
that on the per-node basis, too. So having and seeing this data over time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags to
reduce their cardinality will also help. We just did all this stuff for a
large client (tag normalization, soft cache, cache clearing, adjustment of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type": "string"
}
}
},
"author_media": {
"type": "string",
"analyzer": "keyword1"
},
"brandTerms": {
"type": "string",
"analyzer": "keyword1"
},
"calculatedSentimentId": {
"type": "integer",
"index": "analyzed"
},
"calculatedSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"categories": {
"properties": {
"category": {
"type": "string",
"analyzer": "keyword1"
},
"categoryWords": {
"type": "string",
"analyzer": "keyword1"
},
"score": {
"type": "double"
}
}
},
"commentCount": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorId": {
"type": "integer",
"index": "analyzed"
},
"contentAuthorName": {
"type": "string",
"analyzer": "keyword1"
},
"contentId": {
"type": "integer",
"index": "analyzed"
},
"contentJsonMetadata": {
"properties": {
"comment Count": {
"type": "string"
},
"dislikes": {
"type": "string"
},
"favourites": {
"type": "string"
},
"likes": {
"type": "string"
},
"retweet Count": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"contentPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"contentTextFull": {
"type": "string",
"analyzer": "standard1"
},
"contentTextFullHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentTextSnippetHighlighted": {
"type": "string",
"analyzer": "standard1"
},
"contentType": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlId": {
"type": "integer",
"index": "analyzed"
},
"contentUrlPath": {
"type": "string",
"analyzer": "keyword1"
},
"contentUrlPublishedTime": {
"type": "date",
"index": "analyzed",
"format": "dateOptionalTime"
},
"ctmId": {
"type": "long"
},
"domainName": {
"type": "string",
"analyzer": "keyword1"
},
"domainUrl": {
"type": "string",
"analyzer": "keyword1"
},
"domain_media": {
"type": "string",
"analyzer": "keyword1"
},
"findings": {
"type": "string",
"analyzer": "keyword1"
},
"geographyId": {
"type": "integer",
"index": "analyzed"
},
"geographyName": {
"type": "string",
"analyzer": "keyword1"
},
"kloutScore": {
"type": "object"
},
"languageId": {
"type": "integer",
"index": "analyzed"
},
"languageName": {
"type": "string",
"analyzer": "keyword1"
},
"listListeningObjectiveName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceIconPath": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceName": {
"type": "string",
"analyzer": "keyword1"
},
"mediaSourceTypeId": {
"type": "integer",
"index": "analyzed"
},
"mediaSourceTypeName": {
"type": "string",
"analyzer": "keyword1"
},
"notesCount": {
"type": "integer",
"index": "analyzed"
},
"nouns": {
"type": "string",
"analyzer": "stop2"
},
"opinionWords": {
"type": "string",
"analyzer": "keyword1"
},
"phrases": {
"type": "string",
"analyzer": "keyword1"
},
"profileId": {
"type": "integer",
"index": "analyzed"
},
"profileName": {
"type": "string",
"analyzer": "keyword1"
},
"topicId": {
"type": "integer",
"index": "analyzed"
},
"topicName": {
"type": "string",
"analyzer": "keyword1"
},
"userSentimentId": {
"type": "integer",
"index": "analyzed"
},
"userSentimentName": {
"type": "string",
"analyzer": "keyword1"
},
"verbs": {
"type": "string",
"analyzer": "stop2"
}
}
}

A sample of the structure of the data is as follows:

{
"contentType": "comment",
"topicId": 9,
"mediaSourceId": 3,
"contentId": 34834,
"ctmId": 73322,
"contentTextFull": "The low numbers nationally published by
Corelogic were a result of banks holding off foreclosures until
settlement. \nAs Bloomberg and RealtyTrac stated. this will result in
more foreclosure pain in the short term as some of the foreclosures
that should have happened last year instead happen this year which
will likely result in higher foreclosure numbers in 2012 than
2011.\nThe estimates from Realtytrac and Zillow are hovering around 1
million completed foreclosures, or REOs, in 2012, a 25 percent
increase from 2011. \nThe positive is that the data suggests that
short sales net the banks more money so they should be expected to
increase\nThe bottom line is that in the longer term the bank
settlement will help to more quickly clear the so-called shadow
inventory, which will in turn help the housing market finally bottom
out once and for all. \nMy buddy who bought in Santa Luz in 2006 is
asked every month by his bank when he makes his payment on his $1.2mm
underwater home, do you plan on staying in the house? . Per
Corelogic, there are still large numbers still underwater in SD\n-
3800 underwater in 92127\n- 2700 underwater in 92130\nThe good news is
we only have one last market to get hit, and expect the high end.
The $1mm to $2mm has to get hit next.\nhttp://www.mercurynews.com/
business/ci_19899224\nUnfortunatelyhttp://www.mercurynews.com/business/ci_19899224\nUnfortunately,
we can not avoid the headwinds.",
"contentTextFullHighlighted": null,
"contentTextSnippetHighlighted": "The low numbers nationally
published by Corelogic were a result of banks holding off foreclosures
until settlement. \nAs Bloomberg and RealtyTrac stated. this will
result in more foreclosure pain in the short term as some of the
foreclosures that should have happened last year instead happen...",
"contentJsonMetadata": null,
"commentCount": 117,
"contentUrlId": 13535,
"contentUrlPath": "http://www.bubbleinfo.com/2012/02/09/mortgage-
settlement-renegade/http://www.bubbleinfo.com/2012/02/09/mortgage-settlement-renegade/",

"domainUrl": "http://www.bubbleinfo.com", 
"domainName": null, 
"contentAuthorId": 15614, 
"contentAuthorName": "Hankster", 
"authorJsonMetadata": null, 
"authorKloutDetails": null, 
"mediaSourceName": "Board Reader Blog", 
"mediaSourceIconPath": "BoardReaderBlog.gif", 
"mediaSourceTypeId": 1, 
"mediaSourceTypeName": "Blog", 
"geographyId": 0, 
"geographyName": "Unknown", 
"languageId": 1, 
"languageName": "English", 
"topicName": "Bank of America", 
"profileId": 3, 
"profileName": "USAA_Competition1", 
"contentPublishedTime": 1328798840000, 
"contentUrlPublishedTime": 1329336423000, 
"calculatedSentimentId": 4, 
"calculatedSentimentName": "POS", 
"userSentimentId": 0, 
"userSentimentName": null, 
"listListeningObjectiveName": [ 
    "Untagged LO" 
], 
"alertStatus": "assigned", 
"assignedToUserId": 2, 
"assignedToUserName": null, 
"assignedByUserId": 1, 
"assignedByUserName": null, 
"assignedToDepartmentId": 0, 
"assignedToDepartmentName": null, 
"notesCount": 0, 
"nouns": [ 
    "bank", 
    "banks", 
    "Bloomberg", 
    "buddy", 
    "Corelogic", 
    "data", 
    "estimates", 
    "foreclosure", 
    "foreclosures", 
    "headwinds", 
    "home", 
    "house", 
    "housing", 
    "increase", 
    "inventory", 
    "line", 
    "Luz", 
    "market", 
    "mm", 
    "money", 
    "month", 
    "net", 
    "news", 
    "numbers", 
    "pain", 
    "payment", 
    "percent", 
    "Realtytrac", 
    "RealtyTrac", 
    "REOs", 
    "result", 
    "sales", 
    "Santa", 
    "SD", 
    "settlement", 
    "shadow", 
    "term", 
    "turn", 
    "year", 
    "Zillow" 
], 
"verbs": [ 
    "asked", 
    "avoid", 
    "bought", 
    "completed", 
    "expect", 
    "expected", 
    "get", 
    "happen", 
    "happened", 
    "help", 
    "hit", 
    "holding", 
    "hovering", 
    "increase", 
    "makes", 
    "plan", 
    "published", 
    "result", 
    "stated", 
    "staying", 
    "suggests" 
], 
"adjectives": [ 
    "bottom", 
    "clear", 
    "finally", 
    "good", 
    "high", 
    "higher", 
    "instead", 
    "large", 
    "last", 
    "likely", 
    "longer", 
    "low", 
    "nationally", 
    "next", 
    "not", 
    "positive", 
    "quickly", 
    "short", 
    "so-called", 
    "underwater", 
    "Unfortunately" 
], 
"phrases": [ 
    "2012 than 2011", 
    "25 percent", 
    "25 percent increase", 
    "2700 underwater in 92130", 
    "3800 underwater in 92127", 
    "92130 The good news", 
    "asked every month", 
    "avoid the headwinds", 
    "bank settlement", 
    "banks holding off foreclosures", 
    "banks more money", 
    "Bloomberg and RealtyTrac", 
    "bottom line", 
    "bought in Santa", 
    "bought in Santa Luz", 
    "clear the so-called shadow", 
    "completed foreclosures", 
    "estimates from Realtytrac", 
    "foreclosure numbers", 
    "foreclosure numbers in 2012", 
    "foreclosure pain", 
    "foreclosures until settlement", 
    "good news", 
    "happen this year", 
    "happen this year --", 
    "happened last year", 
    "help the housing", 
    "help the housing market", 
    "higher foreclosure", 
    "higher foreclosure numbers", 
    "holding off foreclosures", 
    "housing market", 
    "increase from 2011", 
    "increase The bottom line", 
    "instead happen this year", 
    "large numbers", 
    "last market", 
    "last year", 
    "longer term", 
    "longer term the bank", 
    "low numbers", 
    "Luz in 2006", 
    "makes his payment", 
    "million completed foreclosures", 
    "mm underwater home", 
    "month by his bank", 
    "nationally published by Corelogic", 
    "net the banks", 
    "not avoid the headwinds", 
    "numbers in 2012", 
    "percent increase", 
    "percent increase from 2011", 
    "published by Corelogic", 
    "Realtytrac and Zillow", 
    "result in higher foreclosure", 
    "result in more foreclosure", 
    "result of banks", 
    "sales net", 
    "sales net the banks", 
    "Santa Luz", 
    "Santa Luz in 2006", 
    "shadow inventory", 
    "short sales", 
    "short sales net", 
    "short term", 
    "so-called shadow", 
    "so-called shadow inventory", 
    "staying in the house", 
    "suggests that short sales", 
    "term the bank", 
    "term the bank settlement", 
    "turn help the housing", 
    "underwater home", 
    "underwater in 92127", 
    "underwater in 92130", 
    "underwater in SD", 
    "year --" 
], 
"author_media": "15614~~~Hankster~~~1~~~Blog", 
"domain_media": "http://www.bubbleinfo.com~~~null~~~1~~~Blog", 
"categories": [ 
    { 
        "category": "post closing", 
        "categoryWords": [ 
            "foreclosure", 
            "foreclosure" 
        ], 
        "score": "2.0" 
    }, 
    { 
        "category": "pre buy research", 
        "categoryWords": [ 
            "term", 
            "term" 
        ], 
        "score": "2.0" 
    } 
], 
"opinionWords": [ 
    "positive", 
    "good news", 
    "expect", 
    "unfortunately" 
], 
"brandTerms": [], 
"findings": [] 

}

Hi Otis,

Thanks a lot for your response.
We will definitely try the approaches you have suggested and update
you soon.

Thanks and Regards
Jagdeep

On Apr 27, 9:12 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in SPM
for ES, we can see how the cache size changes when we run queries vs. when
we use sort vs. when we facet on field X or X and Y, etc. And we can see
that on the per-node basis, too. So having and seeing this data over time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags to
reduce their cardinality will also help. We just did all this stuff for a
large client (tag normalization, soft cache, cache clearing, adjustment of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor cluster
state and health, I didn't find anything like index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown in
task manager java.exe process, and the nodes remain unresponsive until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type":

...

read more »

Hi,

One quick observation, when a single node is maintained for a
cluster, recovery from OOM is happening normally, though it is not that
fast.
But when the cluster is having two nodes, upon OOM the nodes are coming to
a standstill (no response available, CPU usage minimal, memory blocked to
maximum allowed size). On shutting down one node, the other is returning to
responsive state.
We changed multi-cast discovery to uni-cast, played a little with discovery
timeout parameters, with no avail.
What are we missing here, any suggestions?

Thanks and Regards,

On Friday, April 27, 2012 9:47:54 PM UTC+5:30, jagdeep singh wrote:

Hi Otis,

Thanks a lot for your response.
We will definitely try the approaches you have suggested and update
you soon.

Thanks and Regards
Jagdeep

On Apr 27, 9:12 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in
SPM
for ES, we can see how the cache size changes when we run queries vs.
when
we use sort vs. when we facet on field X or X and Y, etc. And we can see
that on the per-node basis, too. So having and seeing this data over
time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags to
reduce their cardinality will also help. We just did all this stuff for
a
large client (tag normalization, soft cache, cache clearing, adjustment
of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to
set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster
state and health, I didn't find anything like
index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing
data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword
clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the
nodes
do not recover to normal state even after dumping the heap to a hprof
file. The node still consumes the maximum allocated memory as shown
in
task manager java.exe process, and the nodes remain unresponsive
until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable
to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type":

...

read more »

We were trying jmeter testing on elasticsearch queries that are being used
in our application. We ran single user as well as five concurrent user
tests via jmeter.
Following are the findings:

  1. Regarding the data sample that I posted early in the mail trail, and the
    kind of query I posted, a node of 2GB max heap size is being able to serve
    a query n 100000 data volume. On increasing the data volume, the node is
    facing OOM. My question is, will dividing the data into more shards, and
    adding more nodes (with same configuration), help me avoid hitting OOM?

  2. I have used two configurations here, one - multiple nodes in one
    machine, with less heap space each node
    . two - single node in one
    machine, with more heap space
    . Which one is better in terms of concurrent
    requests, heavy requests (terms facets), and what is the best shard
    configuration?

  3. Regarding recovery from OOM, elasticsearch is showing random behavior.
    We have switched off dumping heap to file. Still sometimes ES recovers from
    OOM, sometimes not. How to ensure avoidance of OOM from requests only? I
    mean something like when a query is causing a tending to OOM, identifying
    and aborting that query only, without making ES unresponsive.
    Does it
    sound absurd?

Regards,

On Saturday, April 28, 2012 11:31:45 PM UTC+5:30, Sujoy Sett wrote:

Hi,

One quick observation, when a single node is maintained for a
cluster, recovery from OOM is happening normally, though it is not that
fast.
But when the cluster is having two nodes, upon OOM the nodes are coming to
a standstill (no response available, CPU usage minimal, memory blocked to
maximum allowed size). On shutting down one node, the other is returning to
responsive state.
We changed multi-cast discovery to uni-cast, played a little with
discovery timeout parameters, with no avail.
What are we missing here, any suggestions?

Thanks and Regards,

On Friday, April 27, 2012 9:47:54 PM UTC+5:30, jagdeep singh wrote:

Hi Otis,

Thanks a lot for your response.
We will definitely try the approaches you have suggested and update
you soon.

Thanks and Regards
Jagdeep

On Apr 27, 9:12 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in
SPM
for ES, we can see how the cache size changes when we run queries vs.
when
we use sort vs. when we facet on field X or X and Y, etc. And we can
see
that on the per-node basis, too. So having and seeing this data over
time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags to
reduce their cardinality will also help. We just did all this stuff
for a
large client (tag normalization, soft cache, cache clearing, adjustment
of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to
set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster
state and health, I didn't find anything like
index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing
data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword
clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the
nodes
do not recover to normal state even after dumping the heap to a
hprof
file. The node still consumes the maximum allocated memory as shown
in
task manager java.exe process, and the nodes remain unresponsive
until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable
to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type":

...

read more »

Adding one more query ....

  1. Our ES installation has some 50 indexes in total. After a shutdown, it
    typically takes some 5-10 minutes to get the green state, and before that,
    queries tend to result in UnavailableShardException. Can we control or
    speed up the recovery of some indexes on priority than others.

Thanks,

On Tuesday, May 1, 2012 10:23:22 PM UTC+5:30, Sujoy Sett wrote:

We were trying jmeter testing on elasticsearch queries that are being used
in our application. We ran single user as well as five concurrent user
tests via jmeter.
Following are the findings:

  1. Regarding the data sample that I posted early in the mail trail, and
    the kind of query I posted, a node of 2GB max heap size is being able to
    serve a query n 100000 data volume. On increasing the data volume, the node
    is facing OOM. *My question is, will dividing the data into more shards,
    and adding more nodes (with same configuration), help me avoid hitting OOM?
  1. I have used two configurations here, one - multiple nodes in one
    machine, with less heap space each node
    . two - single node in one
    machine, with more heap space
    . Which one is better in terms of
    concurrent requests, heavy requests (terms facets), and what is the best
    shard configuration?

  2. Regarding recovery from OOM, elasticsearch is showing random behavior.
    We have switched off dumping heap to file. Still sometimes ES recovers from
    OOM, sometimes not. How to ensure avoidance of OOM from requests only? I
    mean something like when a query is causing a tending to OOM, identifying
    and aborting that query only, without making ES unresponsive.
    Does it
    sound absurd?

Regards,

On Saturday, April 28, 2012 11:31:45 PM UTC+5:30, Sujoy Sett wrote:

Hi,

One quick observation, when a single node is maintained for a
cluster, recovery from OOM is happening normally, though it is not that
fast.
But when the cluster is having two nodes, upon OOM the nodes are coming
to a standstill (no response available, CPU usage minimal, memory blocked
to maximum allowed size). On shutting down one node, the other is returning
to responsive state.
We changed multi-cast discovery to uni-cast, played a little with
discovery timeout parameters, with no avail.
What are we missing here, any suggestions?

Thanks and Regards,

On Friday, April 27, 2012 9:47:54 PM UTC+5:30, jagdeep singh wrote:

Hi Otis,

Thanks a lot for your response.
We will definitely try the approaches you have suggested and update
you soon.

Thanks and Regards
Jagdeep

On Apr 27, 9:12 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in
SPM
for ES, we can see how the cache size changes when we run queries vs.
when
we use sort vs. when we facet on field X or X and Y, etc. And we can
see
that on the per-node basis, too. So having and seeing this data over
time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags
to
reduce their cardinality will also help. We just did all this stuff
for a
large client (tag normalization, soft cache, cache clearing,
adjustment of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have
to set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster
state and health, I didn't find anything like
index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending
the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing
data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword
clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space
shortage,
and the nodes becomes unresponsive. Our main concern is that the
nodes
do not recover to normal state even after dumping the heap to a
hprof
file. The node still consumes the maximum allocated memory as
shown in
task manager java.exe process, and the nodes remain unresponsive
until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was
unable to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and
request
you to please suggest a better configuration or a different
approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is
a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type":

...

read more »

We were trying jmeter testing on elasticsearch queries that are being used
in our application. We ran single user as well as five concurrent user
tests via jmeter.
Following are the findings:

  1. Regarding the data sample that I posted early in the mail trail, and the
    kind of query I posted, a node of 2GB max heap size is being able to serve
    a query n 100000 data volume. On increasing the data volume, the node is
    facing OOM. My question is, will dividing the data into more shards, and
    adding more nodes (with same configuration), help me avoid hitting OOM?

  2. I have used two configurations here, one - multiple nodes in one
    machine, with less heap space each node
    . two - single node in one
    machine, with more heap space
    . Which one is better in terms of concurrent
    requests, heavy requests (terms facets), and what is the best shard
    configuration?

  3. Regarding recovery from OOM, elasticsearch is showing random behavior.
    We have switched off dumping heap to file. Still sometimes ES recovers from
    OOM, sometimes not. How to ensure avoidance of OOM from requests only? I
    mean something like when a query is causing a tending to OOM, identifying
    and aborting that query only, without making ES unresponsive.
    Does it
    sound absurd?

  4. Our ES installation has some 50 indexes in total. After a shutdown, it
    typically takes some 5-10 minutes to get the green state, and before that,
    queries tend to result in UnavailableShardException. Can we control or
    speed up the recovery of some indexes on priority than others.

Thanks,

On Saturday, April 28, 2012 11:31:45 PM UTC+5:30, Sujoy Sett wrote:

Hi,

One quick observation, when a single node is maintained for a
cluster, recovery from OOM is happening normally, though it is not that
fast.
But when the cluster is having two nodes, upon OOM the nodes are coming to
a standstill (no response available, CPU usage minimal, memory blocked to
maximum allowed size). On shutting down one node, the other is returning to
responsive state.
We changed multi-cast discovery to uni-cast, played a little with
discovery timeout parameters, with no avail.
What are we missing here, any suggestions?

Thanks and Regards,

On Friday, April 27, 2012 9:47:54 PM UTC+5:30, jagdeep singh wrote:

Hi Otis,

Thanks a lot for your response.
We will definitely try the approaches you have suggested and update
you soon.

Thanks and Regards
Jagdeep

On Apr 27, 9:12 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi Sujoy,

Say hi to Ian from Otis please :wink:

And about monitoring - we've used SPM for Elasticsearch to see and
understand behaviour of ES cache(s). Since we can see trend graphs in
SPM
for ES, we can see how the cache size changes when we run queries vs.
when
we use sort vs. when we facet on field X or X and Y, etc. And we can
see
that on the per-node basis, too. So having and seeing this data over
time
also helps with your "Just out of inquisitiveness, what is ES doing
internally?" question. :slight_smile:

You can also clear FieldCache for a given field and set TTL on it.
And since you mention using this for tag cloud, normalizing your tags to
reduce their cardinality will also help. We just did all this stuff
for a
large client (tag normalization, soft cache, cache clearing, adjustment
of
field types to those that use less memory, etc.) and SPM for ES came in
very handy, if I may say so! :slight_smile:

Otis

On Friday, April 27, 2012 6:42:35 AM UTC-4, Sujoy Sett wrote:

Hi,

Can u please explain how to check the field data cache ? Do I have to
set
anything to monitor explicitly?
I often use the mobz-elasticsearch-head-24935c4 plugin to monitor
cluster
state and health, I didn't find anything like
index.cache.field.max_size
there in the cluster_state details.

Thanks and Regards,

On Friday, April 27, 2012 3:52:04 PM UTC+5:30, Rafał Kuć wrote:

Hello,

Did you look at the size of the field data cache after sending the
example query ?

Regards,
Rafał

W dniu piątek, 27 kwietnia 2012 12:15:38 UTC+2 użytkownik Sujoy Sett
napisał:

Hi,

We have been using elasticsearch 0.19.2 for storing and analyzing
data
from social media blogs and forums. The data volume is going up to
500000 documents per index, and size of this volume of data in
Elasticsearch index is going up to 3 GB per index per node (all
shards). We always maintain the number of replicas 1 less than the
total number of nodes to ensure that a copy of all shards should
reside on every node at any instant. The number of shards are
generally 10 for the size of indexes we mentioned above.

We try different queries on these data for advanced visualization
purpose, and mainly facets for showing trend charts or keyword
clouds.
Following are some example of the query we execute:
{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "nouns",
"size" : 100
},
"_cache":false
}
}
}

{
"query" : {
"match_all" : { }
},
"size" : 0,
"facets" : {
"tag" : {
"terms" : {
"field" : "phrases",
"size" : 100
},
"_cache":false
}
}
}

While executing such queries we often encounter heap space shortage,
and the nodes becomes unresponsive. Our main concern is that the
nodes
do not recover to normal state even after dumping the heap to a
hprof
file. The node still consumes the maximum allocated memory as shown
in
task manager java.exe process, and the nodes remain unresponsive
until
we manually kill and restart them.

ES Configuration 1:
Elasticsearch Version 0.19.2
2 Nodes, one on each physical server
Max heap size 6GB per node.
10 shards, 1 replica.

ES Configuration 2:
Elasticsearch Version 0.19.2
6 Nodes, three on each physical server
Max heap size 2GB per node.
10 shards, 5 replica.

Server Configuration:
Windows 7 64 bit
64 bit JVM
8 GB pysical memory
Dual Core processor

For both the configuration mentioned above Elasticsearch was unable
to
respond to the facet queries mentioned above, it was also unable to
recover when a query failed due to heap space shortage.

We are facing this issue in our production environments, and request
you to please suggest a better configuration or a different approach
if required.

The mapping of the data is we use is as follows:
(keyword1 is a customized keyword analyzer, similarly standard1 is a
customized standard analyzer)

{
"properties": {
"adjectives": {
"type": "string",
"analyzer": "stop2"
},
"alertStatus": {
"type": "string",
"analyzer": "keyword1"
},
"assignedByUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedByUserName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToDepartmentId": {
"type": "integer",
"index": "analyzed"
},
"assignedToDepartmentName": {
"type": "string",
"analyzer": "keyword1"
},
"assignedToUserId": {
"type": "integer",
"index": "analyzed"
},
"assignedToUserName": {
"type": "string",
"analyzer": "keyword1"
},
"authorJsonMetadata": {
"properties": {
"favourites": {
"type": "string"
},
"followers": {
"type": "string"
},
"following": {
"type": "string"
},
"likes": {
"type": "string"
},
"listed": {
"type": "string"
},
"subscribers": {
"type": "string"
},
"subscription": {
"type": "string"
},
"uploads": {
"type": "string"
},
"views": {
"type": "string"
}
}
},
"authorKloutDetails": {
"dynamic": "true",
"properties": {
"amplificationScore": {
"type": "string"
},
"authorKloutDetailsFound": {
"type": "string"
},
"description": {
"type": "string"
},
"influencees": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"influencers": {
"dynamic": "true",
"properties": {
"kscore": {
"type": "string"
},
"twitter_screen_name": {
"type": "string"
}
}
},
"kloutClass": {
"type": "string"
},
"kloutClassDescription": {
"type": "string"
},
"kloutScore": {
"type": "string"
},
"kloutScoreDescription": {
"type": "string"
},
"kloutTopic": {
"type": "string"
},
"slope": {
"type": "string"
},
"trueReach": {
"type": "string"
},
"twitterId": {
"type": "string"
},
"twitterScreenName": {
"type":

...

read more »