Very slow upload performance

I have 16 servers trying to upload to an Elasticsearch cluster simulatenously, and they are causing it to constantly timeout as well as hit 100% max cpu utilization. I believe, although im not certain, that this is causing very poor upload performance on my cluster. I believe I've had better performance uploading with just one server at a time.

Elasticsearch cluster config:
version 1.5
12 data nodes (m3.medium.elasticsearch on AWS)
3 dedicated master nodes (m3.medium.elasticsearch)

each data node has 35 GB of SSD, and I'm uploading using the REST api, since AWS doesn't support the transport client.

Here are mappings for the elastic search cluster. The number of documents for the company mapping is very very small contained to its child document mapping. The documents I'm uploading range from 100kb-5mb in size (for body tag), however generally theyre on the smaller side.

PUT /index
{
"settings": {
"analysis": {
"char_filter": {
"special_characters": {
"type":"mapping",
"mappings":[
"> => \u0020RBCindexGTsymbol\u0020",
">= => \u0020RBCindexGTEQsymbol\u0020",
"≥ => \u0020RBCindexGTEQsymbol\u0020",
"< => \u0020RBCindexLTsymbol\u0020",
"≤ => \u0020RBCindexLTEQsymbol\u0020",
"<= => \u0020RBCindexLTEQsymbol\u0020",
"= => \u0020RBCindexEQsymbol\u0020",
"+ => \u0020RBCindexPLsymbol\u0020",
"% => \u0020RBCindexPERCsymbol\u0020",
"~ => \u0020RBCindexTILDsymbol\u0020",
"± => \u0020RBCindexPLMINsymbol\u0020",
"+- => \u0020RBCindexPLMINsymbol\u0020",
"÷ => \u0020RBCindexDIVsymbol\u0020",
"$ => \u0020RBCindexDOLLARsymbol\u0020",
"¢ => \u0020RBCindexCENTSsymbol\u0020",
"€ => \u0020RBCindexEUROsymbol\u0020",
"â‚ => \u0020RBCindexEUROCURsymbol\u0020",
"₤ => \u0020RBCindexLIRAsymbol\u0020",
"₨ => \u0020RBCindexRUPEEsymbol\u0020",
"Â¥ => \u0020RBCindexYENsymbol\u0020"
]
}
},
"analyzer": {
"include_special_chars": {
"type": "custom",
"char_filter": [ "icu_normalizer","special_characters"],
"tokenizer": "icu_tokenizer",
"filter": [ "icu_normalizer" ]
}}
}}}

PUT ENDPOINT/INDEX/_mapping/company
{
"properties":{
"cik": {
"type":"long"
},
"ticker": {
"type":"string",
"index":"not_analyzed"
},
"assignedSIC": {
"type":"integer"
},
"companyName": {
"type":"string",
"index":"not_analyzed"
}
}
}
PUT ENDPOINT/INDEX/_mapping/special-document
{
"_parent": {
"type":"company"
},
"properties": {
"body": {
"type":"string",
"term_vector":"with_positions_offsets",
"analyzer":"include_special_chars"
},
"filename": {
"type":"string",
"index":"not_analyzed"
},
"sequence": {
"type":"integer",
"index":"not_analyzed"
},
"type": {
"type":"string",
"index":"not_analyzed"
},
"accessionNumber": {
"type":"long"
},
"filingDate": {
"type":"date"
}
}
}
PUT ENDPOINT/INDEX/_mapping/uploaded-document/
{
"properties":{
"body":{
"type":"string",
"term_vector":"with_positions_offsets",
"analyzer":"include_special_chars"
},
"bucket":{
"type":"string",
"index":"not_analyzed"
},
"key":{
"type":"string",
"index":"not_analyzed"
},
"filename":{
"type":"string",
"index":"not_analyzed"
},
"uuid":{
"type":"string",
"index":"not_analyzed"
}
}
}

If an upgrade of Elasticsearch is possible, it will provide you with overall better performance. Have you used any tool like Marvel to check performance metrics and segment merging during high indexing load?

1 Like