I have 16 servers trying to upload to an Elasticsearch cluster simulatenously, and they are causing it to constantly timeout as well as hit 100% max cpu utilization. I believe, although im not certain, that this is causing very poor upload performance on my cluster. I believe I've had better performance uploading with just one server at a time.
Elasticsearch cluster config:
version 1.5
12 data nodes (m3.medium.elasticsearch on AWS)
3 dedicated master nodes (m3.medium.elasticsearch)
each data node has 35 GB of SSD, and I'm uploading using the REST api, since AWS doesn't support the transport client.
Here are mappings for the elastic search cluster. The number of documents for the company mapping is very very small contained to its child document mapping. The documents I'm uploading range from 100kb-5mb in size (for body tag), however generally theyre on the smaller side.
PUT /index
{
"settings": {
"analysis": {
"char_filter": {
"special_characters": {
"type":"mapping",
"mappings":[
"> => \u0020RBCindexGTsymbol\u0020",
">= => \u0020RBCindexGTEQsymbol\u0020",
"≥ => \u0020RBCindexGTEQsymbol\u0020",
"< => \u0020RBCindexLTsymbol\u0020",
"≤ => \u0020RBCindexLTEQsymbol\u0020",
"<= => \u0020RBCindexLTEQsymbol\u0020",
"= => \u0020RBCindexEQsymbol\u0020",
"+ => \u0020RBCindexPLsymbol\u0020",
"% => \u0020RBCindexPERCsymbol\u0020",
"~ => \u0020RBCindexTILDsymbol\u0020",
"± => \u0020RBCindexPLMINsymbol\u0020",
"+- => \u0020RBCindexPLMINsymbol\u0020",
"÷ => \u0020RBCindexDIVsymbol\u0020",
"$ => \u0020RBCindexDOLLARsymbol\u0020",
"¢ => \u0020RBCindexCENTSsymbol\u0020",
"€ => \u0020RBCindexEUROsymbol\u0020",
"â‚ => \u0020RBCindexEUROCURsymbol\u0020",
"₤ => \u0020RBCindexLIRAsymbol\u0020",
"₨ => \u0020RBCindexRUPEEsymbol\u0020",
"Â¥ => \u0020RBCindexYENsymbol\u0020"
]
}
},
"analyzer": {
"include_special_chars": {
"type": "custom",
"char_filter": [ "icu_normalizer","special_characters"],
"tokenizer": "icu_tokenizer",
"filter": [ "icu_normalizer" ]
}}
}}}
PUT ENDPOINT/INDEX/_mapping/company
{
"properties":{
"cik": {
"type":"long"
},
"ticker": {
"type":"string",
"index":"not_analyzed"
},
"assignedSIC": {
"type":"integer"
},
"companyName": {
"type":"string",
"index":"not_analyzed"
}
}
}
PUT ENDPOINT/INDEX/_mapping/special-document
{
"_parent": {
"type":"company"
},
"properties": {
"body": {
"type":"string",
"term_vector":"with_positions_offsets",
"analyzer":"include_special_chars"
},
"filename": {
"type":"string",
"index":"not_analyzed"
},
"sequence": {
"type":"integer",
"index":"not_analyzed"
},
"type": {
"type":"string",
"index":"not_analyzed"
},
"accessionNumber": {
"type":"long"
},
"filingDate": {
"type":"date"
}
}
}
PUT ENDPOINT/INDEX/_mapping/uploaded-document/
{
"properties":{
"body":{
"type":"string",
"term_vector":"with_positions_offsets",
"analyzer":"include_special_chars"
},
"bucket":{
"type":"string",
"index":"not_analyzed"
},
"key":{
"type":"string",
"index":"not_analyzed"
},
"filename":{
"type":"string",
"index":"not_analyzed"
},
"uuid":{
"type":"string",
"index":"not_analyzed"
}
}
}