Elastic search indexing tuninig

Hello I have a server with 4 cpus each cpu has 4 cores and 16gb ram and a virtual storage.
I tried to indexed a 250mb file but it took 30mins. How can we tune elastic search to the max to make it use all cpu cores and jvm and to index as fast as possible

How did you go about indexing it? What did CPU usage and disk I/o stats and iowait look like while you were indexing? Have you followed these guidelines?

1 Like

Well i am new to elasticsearch and i need help t
To understand more these guidelines. And how can i increase the numbers of threads to take maximum cpu capacity and use multiple cpus for indexing

It's unclear to me what you are indexing (the content of your file) and how (which tool, code... you are using).

I am indexing through logstash because they are csv files and I this is the logstash conf file `input {
file {
path => "C:/Users/Fadi/Desktop/internship/OCC*.csv"
start_position => "beginning"

}
}
filter {
csv {
separator => ","
columns =>
["calltype", "recordtype", "chrononumber", "servedSubscriptionIDNumber", "servedSubscriptionID", "callingcallednumber", "callforwardflag", "callnumberincaseofCF", "calldate", "calltime", "faxdatavoicesms", "teleservicenumber", "number", "imsi", "intermediatecalltype", "inServicekey", "scfaddress", "mSCAddress", "totaloctet", "cdrType", "accountValueBefore", "accountValueAfter", "familyAndFriendsIndicator", "selectedCommunityID", "familyAndFriendsNo", "accountGroupID", "selectionTreeType", "servedAccount", "serviceOfferings", "terminationCause", "chargingContextID", "serviceContextID", "serviceSessionID", "resultCode", "resultCodeExtension", "triggerTime", "nodeName", "partialSequenceNumber", "lastPartialOutput", "correlationIDType", "correlationID", "servingElementType", "servingElement", "usedUnchargedServiceUnitsNumber", "usedUnchargedServiceUnits", "uCounternumber", "uCounter", "mAinbefore", "mAinafter", "mAincharge", "accessPoint", "sgsnaddress", "lastSgsnAddress", "ggsnaddress", "firstGgsnName", "lastGgsnAddress", "lastGgsnName", "imei", "dANumber", "dAList", "pamServiceID", "pamClassID", "scheduleID", "decimals", "currency", "currentPamPeriod", "allpartials", "chargeID", "aCCNumber", "aCCList", "bonusAccNumber", "bonusAccList", "bonusmAinBefore", "bonusmAinAfter", "bonusmAinChange", "bonusDANumber", "bonusDAList", "duration", "nCR", "sPI", "dASharedNumber", "dASharedList", "accumulatedCost", "providerAccount", "providerServiceClassID", "accountGroupID1", "accountValueDeducted", "accumulatedUnits", "accountUnitsDeducted", "treeparameterID", "treeparameterValue", "usedoffers", "sharedOfferID", "sharedFamilyAndfriendsID", "sharedFamilyAndfriendsNO", "providersfamilyandfriendsid", "sharedPamServiceID", "sharedPamClassID", "sharedScheduleID", "sharedCurrentPamPeriod", "sharedOfferProviderID", "uCSharedNumber", "uCSharedList", "treeparameterID1", "treeparameterValue1", "tDFNumber", "tDFList", "offerNumber", "offerList", "firstCellID", "lastCellID", "firstLAC", "lastLAC", "firstTAC", "lastTAC", "firstRATtype", "lastRATtype", "DA40,", "BONUSDA9"]

}

ruby {
code => "
x = event.get('dAList').split('|').collect { |t|
c = t.split '~'
{
'DA_ID' => c[0].to_i,
'DA_Before' => c[1].to_i,-
'DA_After' => c[2].to_i,
'DA_Change' => c[3].to_i,
'DA_ExpDate' => c[4]
}
}
event.set('DAList', x)

"
}

date {
match => [ "[DAList][0][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][0][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][1][DA_ExpDate]","YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][1][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][2][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][2][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][3][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][3][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][4][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][4][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][5][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][5][DA_ExpDate]"
timezone => "UTC"
}
}

output {
elasticsearch {
hosts => "localhost"
index => "occ"
document_type => "data"

}
stdout {
codec => rubydebug}
}`

What does seem to be limiting performance?

Are you saturating CPU while indexing? If so, how much its used by Elasticsearch and Logstash respectively?

What does disk I/O look like? Do you have very slow storage that could limit throughput? Elasticsearch is often quite I/O intensive during indexing.

I will set metric beat to view them I don't have it installed yet and I will reply the result but all I did was change the JVM heap and shards to 1 and all the left settings are default and I also have only elastic stack on my server

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.