Understanding my Index using HEAD plugin

I am trying to understand my index (attached in screenshot) and how can I
improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing
421,000 docs as shown in the image.

  • I am using two nodes (1 & 2) ; node 1 master

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it
will be over 40 GB with 5 million docs? Does the size sound too big? I do
have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes?
Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/073c71bf-7499-4abc-8da6-5a381834c1bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hi,

-for approximating the size, try doing so more test and you should be able
to get an idea, also the size would depend very much on the type of data
you are trying to index
-elastic HQ(www.elastichq.org) will be able to provide you more incite on
the details of the cluster, size per index can be seen under 'node
diagnostics' tab.

Thanks and Regards
Sri

On Friday, June 27, 2014 11:12:32 AM UTC-4, IronMan2014 wrote:

I am trying to understand my index (attached in screenshot) and how can I
improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing
421,000 docs as shown in the image.

  • I am using two nodes (1 & 2) ; node 1 master

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it
will be over 40 GB with 5 million docs? Does the size sound too big? I do
have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes?
Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e920019-a60c-406c-b31f-4a59b2b9f2d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Great. One of the stats is "deleted docs" or merge rate, this shows 18% in
my example, it says if this number is high, it means slow I/O.
I am not really sure if 19% is high, how can I control this number?

On Friday, June 27, 2014 11:34:32 AM UTC-4, sri wrote:

hi,

-for approximating the size, try doing so more test and you should be able
to get an idea, also the size would depend very much on the type of data
you are trying to index
-elastic HQ(www.elastichq.org) will be able to provide you more incite on
the details of the cluster, size per index can be seen under 'node
diagnostics' tab.

Thanks and Regards
Sri

On Friday, June 27, 2014 11:12:32 AM UTC-4, IronMan2014 wrote:

I am trying to understand my index (attached in screenshot) and how can I
improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing
421,000 docs as shown in the image.

  • I am using two nodes (1 & 2) ; node 1 master

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing
it will be over 40 GB with 5 million docs? Does the size sound too big? I
do have store = YES for PDF docs.
Q2) What is the (8 GB) from the image, is this the size on the 2 nodes?
Also, what is (526,428) ?
Q3) Should I do more nodes, more/less shards?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b65f515-5c46-4056-a518-afe422440809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

See my answers inline.

On Fri, Jun 27, 2014 at 8:42 PM, IronMan2014 sabdalla80@gmail.com wrote:

I am trying to understand my index (attached in screenshot) and how can I
improve size and performance.
The goal is to index 5 million docs. So, I started small by indexing 421,000
docs as shown in the image.

  • I am using two nodes (1 & 2) ; node 1 master

Q1) My index size is 3.99 GB with 421,627 docs so far, so I am guessing it
will be over 40 GB with 5 million docs? Does the size sound too big? I do
have store = YES for PDF docs.

Total size depends on the kind of docs you'll index. So, it depends!

Q2) What is the (8 GB) from the image, is this the size on the 2 nodes?
Also, what is (526,428) ?

Total size of primary shards equals 3.99GB in your case. So, 3.99GB
will be the total size in case you had zero replica. As you've 1
replica set, actual disk space used is 8GB.

Regarding number of documents, 421627 is the total number of docs
present in your index. 526428 is the max_docs your index has seen
before the merge removed the deleted docs.

Q3) Should I do more nodes, more/less shards?

That really depends. I would suggest doing some tests to find out what
works best for you. Even having 2 shards will work in your case as you
have 2 nodes and each primary shard will go to different nodes. But
then, it'll limit your option of adding a node in case you need more
nodes. ( Of course, there are workarounds).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/073c71bf-7499-4abc-8da6-5a381834c1bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Cheers,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACXxYfzn7dFFU7CGG1%2BE_b4U8NSTNnx3me2Jmrs3U9FexpAsSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.