HOW MANY CONNECTIONS DO YOUR NdBENCH CLIENTS USE CONCURRENTLY TO INDEX INTO THE CLUSTER?
each of the 30 NDB bench clients has 50 writer threads.
on start-up they attempt requests to target cluster at a rate of 20 requests per second,
ramping up to 40 requests per second over 3 minutes as long as error rate is below 10%.
But before I even get to 20 requests per second I start getting time outs on almost all the ndbench clients.
So the effective request rate is much lower than 20 per second per client.
Currently i am ignoring these timeouts. I am simulating a high load where such timeouts could happen
in production.
But if this could be causing a performance issue on the
server I would go back and try to make my benchmark client back off a little more so to minimize
occurence of time outs.
ARE CONNECTIONS DISTRIBUTED EVENLY ACROSS NODES IN THE CLUSTER?
yes. as described above. Each client behaves in the same way.
HOW MANY INDICES ARE YOU INDEXING INTO?
1
HOW MANY PRIMARY AND REPLICA SHARDS FOR EACH?
360 primary shards, 2 replicas
HOW MANY INDICES DO YOU EXPECT TO INDEX INTO IN PRODUCTION?
1 index being actively written to for current day's log data.
4 previous days index
all will have light search activity.. with the main activity being centered in the most recent index.
YOU MENTIONED THAT INDEXING THROUGHPUT SLOWS DOWN AFTER 60gb HAS BEEN INDEXED.
IS THAT IN TOTAL, PER NODE, PER INDEX OR PER SHARD?
My bad. I should have written that slow down occurs after 60 TERABYTES of data are written
to the primary shards (across whole cluster.)
DO THE INDEXING RATES MENTIONED PER NODE INCLUDE INDEXING INTO REPLICA SHARDS OR
IS IT JUST DOCUMENTS INDEXED INTO PRIMARY SHARDS? HOW IS THIS MEASURED?
rate given for total number of documents: primary copies as well as replicas
for details, please see THROUGHPUT MEASUREMENT SCRIPT, below
ARE YOU USING NESTED DOCUMENTS?
yes
IF SO, HOW MANY NESTED ELEMENTS ARE THERE ON AVERAGE PER DOCUMENT?
please see FULL EXAMPLE OF DOCUMENT, below.
WHY ARE YOU PADDING DATA UP TO A SPECIFIC LENGTH?
To get to the total number of bytes per document to be 2,300, which is the
average of the production cluster whose load the benchmark is trying to simulate.
FULL EXAMPLE OF DOCUMENT
"object0" : {
"text0" : "1YJj9HJ0goMOvrXQxkNEAX68aGP1lZ5KjnRbxerpG2Dga0tz9b60FeHztdrcmPh0gzRqVCLjLBkcNO3v4lT4iI5wMs5vPsP4EsqqHV4eeMMdlHgd5AVyWSaOXDNpiCyFrMsz1geu4xySKBT4B5NuA2HhlNFcBbnekPbuhAgWmLcCafhHKqByQWlclq8ob5bVUesKyVGbWOTGzAbi9CJLFBSSpMIXSLb67uoatQ",
"long0" : 138,
"keyword0" : "fdYGTeeYwEWn0CEt8aindoGHrNPYe9TWyqTMPMZIlS9zSyLSAey1Frb5mMmJclQuSwXsPYOHm7I1wTU0SOnKBUHLm0vVyFcyzyF4BTrHRARJ1PRF3o7gMPdVVM8NYAtMCo2g1zYp9c915EbPBNL93DgEeEJHZDFGhrVo4riu7eQrKM1VWiTjQSI5qKAL8hmW1kE8yzHH520B4i8XIm48XrmHFQLdDK1tznhrUL"
},
"keyword3" : "uJcQ3i6ugx3sM5dntK8oYp51NCyGFI0mRdK8qXQaRYltjSVUxue2XrhquoHeIAhbG0hpF9zQPx3SVQIrCbs55dLQD1UuFfRnhwxHQadh0T74c0qcBvdzJVWHmcIB2gzMLJFHpBvQ8j7HaovDWTn78cUfDSutpTHFF60KBTVLaNvOyR2mK8WpbitWkTVraWHL6ExDaPgKIZi4v0Ch1XxhLBV94PyZ274aHMOYPl",
"keywords" : "WodASGAIhVgQBDCqibR5qd30jnvBXVg6icUAU3OJvhNdr6qd84XKCFDz4xNMghTilIHuZj6kyF9yNl9sjvz6mdlzTdjKMV4J5yxQejhlzkQEsyWxJtlbeDTgjSrbWcWYW91xlOVMELqxoz3opFo88Gs6IANWr3Ezh9S4zuAcwRtpfqZalvXq56mE8CHtZ0DG8i57r7Bq2cAciBLpdca1R1WsxZ9oqb5yBRVnhm",
"keyword1" : "4pYUeclgzkUdy9KVh4m9nIEuCLHIDc5ldoayJhXsoL8tln1NkC68ifMDxAqatIeJu7XpqPzfn7suTmHquJTxYouTmkvrkbsqslz9HoyDt6Rc3oGyTwRWgxiky5VHLJ9kn4me4x1P8xuHkovW5XsbET9hmUW9sVnooTl1sAuzCpQmoRdVep1xspHoPtrcXIX5ZwrOefOUlHLuIeDBFEed8sUSjvMWL4yL5GWxCG",
"keyword2" : "8YK5nKI5pL7DtxMaZdrb3XQBlKnVwXxaHiKdujq5oYLwvZImKEKssF5jLJbbte9Y6UCUV8uZcSXu4scnka8fZwL0g20I639SDmLUkM6qch7sbfb7oEcsPyd33G81XMALHAibVX9vJHN7tFp9HIl7s8a68QmOqw5L4aXgGK04qVyFnHmk0uyftSEocBLkT5rDwpdMtyGjTkDPmCtF1YJrKwVXTjsFgu1peRe7Qq",
"nested0" : {
"text0" : "ygxwDCokFJmhiJI4ZpXx8urIN8n1nahqeehvlMAcm4nuzLNp3WvBG4tQDk7RSMIJHM8t5DS0Oeb70FoqSj2w2Sh5LzNCGIMcviHjdQkPctNDETYHr2SuKEJkLvubj6aYZSkyQcDhSZY57nzkduHJHOtLCWDwvHgsJULvQFSJTJcdml2NzzwMKW8CA4eF600746H5tO46tRYG0GRpAbCnh0VDnmF4zMH9s2Yn0r",
"long0" : 480,
"keyword0" : "w3XVJYIO7P1wBZxIKZ5NMEqTSvnvUadEEGRxaoRDkSBOYRsJ9OwTU41rA8JQnQkAqQlA0mOG5xJ6ca88rGifkq6R3alLy3tWcxBFWaTRq3z9Kfrek6S377yV9IglEzohzEjV5ufS0rbteUZSmEWnkOrAYXqnZo8Sq0SR2dY9uK0W60OkaDqTgD22QHRtJkRvvRZv2p8STlM13MqnG1p9vI3ppaeaa3YyRZ1jKC"
},
"date2" : 1504118461705,
"date1" : "Aug 30, 2017 6:41:01 PM",
"long2" : 134,
"long3" : 557,
"long1" : 221,
"isBulkWrite" : "true",
"text1" : "2fC33x3oVXwE6XxxppzS43CtwymjAS6UlOEL1vpOAk8Vn2spzr4hz3GvCopgafMw2FotzzQn5nZRwQBb6wOM76GGyz45egHt1iDHRnLbmaAR1Tj6f6L3VpH05p0HZCuaHC8xnJ5Ftd03c5Jbw18zbp86fdy3otRGDOVLqBRM8l5Ysb0wxeQdtelbLabl2FY9JDmpGDaYCiSqBEFxxdHLlssz6OmxSdtaiLR1w4",
"text2" : "pBoDvylOlngTT2huceIh2eKT3fo0EJ7BnRBwkMs4T3y6Nr9UnWrfhTjcFbU55CJJCseNkLXK5YmFilpvJ6sCrKnJxSTjNsh8otAtaWMHhWoTxmAqA0LeGjbebWlo7ScuoTHSjbIiJWkXQiVovBZ974l71thK1gFeODLTzEhBcSK5t6yvFvHNXgsRxffOqjPloINmXLrYUSrflOICsg8GJt7OAVS0zJ6xtgfzbK",
"long4" : 555
}
THROUGHPUT MEASUREMENT SCRIPT
SCRIPT (calc_rate.sh)
countUrl=$1
numReplicas=$2
numDataNodes=$3
samplePeriodSeconds=$4
beginCount=` curl $countUrl | jq '.indices.ndbench_index.primaries.indexing.index_total' `
sleep $samplePeriodSeconds
endCount=` curl $countUrl | jq '.indices.ndbench_index.primaries.indexing.index_total' `
perSecondIncrease=`python -c "print ($endCount - $beginCount) / ($samplePeriodSeconds*1.0)"`
echo document count growth rate per second is $perSecondIncrease
# multiply increase in number of documents by R which is:
# replica count + 1 (where the 'plus 1' accounts for the original doc)
#
R=`python -c "print ($numReplicas + 1)"`
totDocsGrowthPerSec=`python -c "print ($R * $perSecondIncrease)"`
echo document count growth rate per second including replicas is $totDocsGrowthPerSec
indexRatePerNode=`python -c "print ($totDocsGrowthPerSec*1.0 / $numDataNodes )"`
echo indexRatePerNode is $indexRatePerNode
EXAMPLE OF HOW SCRIPT IS CALLED
dataNodes=120
samplePeriod=120
bash ~/dev/scripts/calc_rate.sh http://host:<port>/_stats?pretty 2 $dataNodes $samplePeriod"