Testing distributed characteristic of Elasticsearch

Hi guys,
Don't get me wrong. This is absolutely not another post about benchmark of
First, I am pretty new to ES. Please be patient if I ask dumb questions. I
am doing a test for academic use only that proving ES's distributed
characteristic is an improvement over Lucene, which is the base of ES. I
want to test that with more than 1 node, the time we get from a search
query is shorter or 'faster'. It is clear that with 2 nodes ( 2 hard disks
) we could get double bandwidth in theory ( each normal disk peak at ~
50MB/s < 128MB = 1Gb of Ethernet so Ethernet is not a bottle neck).

I have 2 physical nodes ( normal laptop ) connected directly via 1Gb
Ethernet port, no router in between. My data is 20GB ( + 20 GB replica) of
3 million records like this : http://pastebin.com/FDhfy6C3
( the source of data I get is http://www.mockaroo.com/67e33320 )

My strategy is to write as many as possible search queries and at the same
time clear the cache. Something like

curl -XPOST ""

curl -XPOST ""

curl -XGET "" -d
"query" : {
"bool" : {
"should" : [
{ "match" : { "first_name" : "Clarence"}},
{ "match" : { "last_name" : "Fernandez"}},
{ "match" : { "country": "uk" }},
{ "match" : { "amount": "$9001.19" }},
{ "match" : { "password_hash": "Th94hnXtaYtZ" }}

I am writing a script to generate as many as possible those match fields but I still want to ask if what I am doing is right?
Any comment/opinion is really appreciated.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0eb8437-a629-4e10-85da-9b9da0076c45%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.