Elastic search indexing tuninig

fadihaddad · January 28, 2019, 2:38pm

Hello I have a server with 4 cpus each cpu has 4 cores and 16gb ram and a virtual storage.
I tried to indexed a 250mb file but it took 30mins. How can we tune elastic search to the max to make it use all cpu cores and jvm and to index as fast as possible

Christian_Dahlqvist · January 28, 2019, 3:18pm

How did you go about indexing it? What did CPU usage and disk I/o stats and iowait look like while you were indexing? Have you followed these guidelines?

fadihaddad · January 28, 2019, 10:19pm

Well i am new to elasticsearch and i need help t
To understand more these guidelines. And how can i increase the numbers of threads to take maximum cpu capacity and use multiple cpus for indexing

dadoonet · January 28, 2019, 11:08pm

It's unclear to me what you are indexing (the content of your file) and how (which tool, code... you are using).

fadihaddad · January 29, 2019, 7:07am

I am indexing through logstash because they are csv files and I this is the logstash conf file `input {
file {
path => "C:/Users/Fadi/Desktop/internship/OCC*.csv"
start_position => "beginning"

}
}
filter {
csv {
separator => ","
columns =>
["calltype", "recordtype", "chrononumber", "servedSubscriptionIDNumber", "servedSubscriptionID", "callingcallednumber", "callforwardflag", "callnumberincaseofCF", "calldate", "calltime", "faxdatavoicesms", "teleservicenumber", "number", "imsi", "intermediatecalltype", "inServicekey", "scfaddress", "mSCAddress", "totaloctet", "cdrType", "accountValueBefore", "accountValueAfter", "familyAndFriendsIndicator", "selectedCommunityID", "familyAndFriendsNo", "accountGroupID", "selectionTreeType", "servedAccount", "serviceOfferings", "terminationCause", "chargingContextID", "serviceContextID", "serviceSessionID", "resultCode", "resultCodeExtension", "triggerTime", "nodeName", "partialSequenceNumber", "lastPartialOutput", "correlationIDType", "correlationID", "servingElementType", "servingElement", "usedUnchargedServiceUnitsNumber", "usedUnchargedServiceUnits", "uCounternumber", "uCounter", "mAinbefore", "mAinafter", "mAincharge", "accessPoint", "sgsnaddress", "lastSgsnAddress", "ggsnaddress", "firstGgsnName", "lastGgsnAddress", "lastGgsnName", "imei", "dANumber", "dAList", "pamServiceID", "pamClassID", "scheduleID", "decimals", "currency", "currentPamPeriod", "allpartials", "chargeID", "aCCNumber", "aCCList", "bonusAccNumber", "bonusAccList", "bonusmAinBefore", "bonusmAinAfter", "bonusmAinChange", "bonusDANumber", "bonusDAList", "duration", "nCR", "sPI", "dASharedNumber", "dASharedList", "accumulatedCost", "providerAccount", "providerServiceClassID", "accountGroupID1", "accountValueDeducted", "accumulatedUnits", "accountUnitsDeducted", "treeparameterID", "treeparameterValue", "usedoffers", "sharedOfferID", "sharedFamilyAndfriendsID", "sharedFamilyAndfriendsNO", "providersfamilyandfriendsid", "sharedPamServiceID", "sharedPamClassID", "sharedScheduleID", "sharedCurrentPamPeriod", "sharedOfferProviderID", "uCSharedNumber", "uCSharedList", "treeparameterID1", "treeparameterValue1", "tDFNumber", "tDFList", "offerNumber", "offerList", "firstCellID", "lastCellID", "firstLAC", "lastLAC", "firstTAC", "lastTAC", "firstRATtype", "lastRATtype", "DA40,", "BONUSDA9"]

ruby {
code => "
x = event.get('dAList').split('|').collect { |t|
c = t.split '~'
{
'DA_ID' => c[0].to_i,
'DA_Before' => c[1].to_i,-
'DA_After' => c[2].to_i,
'DA_Change' => c[3].to_i,
'DA_ExpDate' => c[4]
}
}
event.set('DAList', x)

"
}

date {
match => [ "[DAList][0][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][0][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][1][DA_ExpDate]","YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][1][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][2][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][2][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][3][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][3][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][4][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd" ]
target => "[DAList][4][DA_ExpDate]"
timezone => "UTC"
}
date {
match => [ "[DAList][5][DA_ExpDate]", "YYYYMMdd", "YYYYMdd" , "YYYYMMd", "YYYYMd"]
target => "[DAList][5][DA_ExpDate]"
timezone => "UTC"
}
}

output {
elasticsearch {
hosts => "localhost"
index => "occ"
document_type => "data"

}
stdout {
codec => rubydebug}
}`

Christian_Dahlqvist · January 29, 2019, 7:13am

What does seem to be limiting performance?

Are you saturating CPU while indexing? If so, how much its used by Elasticsearch and Logstash respectively?

What does disk I/O look like? Do you have very slow storage that could limit throughput? Elasticsearch is often quite I/O intensive during indexing.

fadihaddad · January 29, 2019, 11:23am

I will set metric beat to view them I don't have it installed yet and I will reply the result but all I did was change the JVM heap and shards to 1 and all the left settings are default and I also have only elastic stack on my server

system · February 26, 2019, 11:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Tuning Elasticsearch	6	356	March 6, 2019
How to fully utilize the resources of Elasticsearch node Elasticsearch	4	449	October 5, 2020
Elastic search Tuning Performance Elasticsearch	17	797	February 28, 2019
How to inprove the cpu usage? Elasticsearch	6	387	June 16, 2020
How to increase indexing speed? Elasticsearch	5	5310	April 18, 2017

Elastic search indexing tuninig

Related topics