RE: calculating amount of disk space used

Hi good day,

I was wondering how can i calculate the amount of disk space used by
Elastic Search?

Here's an example:

I collected 1 million tweets, using the index "tweets" and I indexed the
"text" and "users" key.

So my 2 questions are:

  1. In this situation, how do i find out the amount of disk space used
    the the "tweets" index?
  2. Is there any way I can find out the amount of diskspace used by
    tweets that contain the hashtag "#YOLO" for example?

Thanks everyone!

Best.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. The Indices Stats APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-stats.htmlcan help you out there. Note, this is only the size of the primaries, not
    the replicas too (although that's easy to calculate on your own)

$ curl localhost:9200/test/_stats
{
"ok": true,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
}
[...]
"indices": {
"test": {
"primaries": {
"docs": {
"count": 7356989,
"deleted": 628485
},
"store": {
"size": "8.7gb",
"size_in_bytes": 9443315781,
"throttle_time": "0s",
"throttle_time_in_millis": 0
}
[...]

  1. I do not believe this is possible, at least not through the API. If
    you know the average size of each tweet, you could use the Count API and
    multiply the count by average doc size.

-Zach

On Sunday, February 10, 2013 9:20:24 AM UTC-5, Elastic Noob wrote:

Hi good day,

I was wondering how can i calculate the amount of disk space used by
Elastic Search?

Here's an example:

I collected 1 million tweets, using the index "tweets" and I indexed the
"text" and "users" key.

So my 2 questions are:

  1. In this situation, how do i find out the amount of disk space used
    the the "tweets" index?
  2. Is there any way I can find out the amount of diskspace used by
    tweets that contain the hashtag "#YOLO" for example?

Thanks everyone!

Best.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

thanks for the tip!

Best Regards,
Eugene

On Sunday, February 10, 2013 10:46:05 PM UTC+8, Zachary Tong wrote:

  1. The Indices Stats APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-stats.htmlcan help you out there. Note, this is only the size of the primaries, not
    the replicas too (although that's easy to calculate on your own)

$ curl localhost:9200/test/_stats
{
"ok": true,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
}
[...]
"indices": {
"test": {
"primaries": {
"docs": {
"count": 7356989,
"deleted": 628485
},
"store": {
"size": "8.7gb",
"size_in_bytes": 9443315781,
"throttle_time": "0s",
"throttle_time_in_millis": 0
}
[...]

  1. I do not believe this is possible, at least not through the API.
    If you know the average size of each tweet, you could use the Count API
    and multiply the count by average doc size.

-Zach

On Sunday, February 10, 2013 9:20:24 AM UTC-5, Elastic Noob wrote:

Hi good day,

I was wondering how can i calculate the amount of disk space used by
Elastic Search?

Here's an example:

I collected 1 million tweets, using the index "tweets" and I indexed the
"text" and "users" key.

So my 2 questions are:

  1. In this situation, how do i find out the amount of disk space used
    the the "tweets" index?
  2. Is there any way I can find out the amount of diskspace used by
    tweets that contain the hashtag "#YOLO" for example?

Thanks everyone!

Best.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi again,

i was wondering if the size of the index reflect the size of the total
amount of disk space used by the dataset? ( including the raw _source data
) ?

Thanks again!

On Sunday, February 10, 2013 10:46:05 PM UTC+8, Zachary Tong wrote:

  1. The Indices Stats APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-stats.htmlcan help you out there. Note, this is only the size of the primaries, not
    the replicas too (although that's easy to calculate on your own)

$ curl localhost:9200/test/_stats
{
"ok": true,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
}
[...]
"indices": {
"test": {
"primaries": {
"docs": {
"count": 7356989,
"deleted": 628485
},
"store": {
"size": "8.7gb",
"size_in_bytes": 9443315781,
"throttle_time": "0s",
"throttle_time_in_millis": 0
}
[...]

  1. I do not believe this is possible, at least not through the API.
    If you know the average size of each tweet, you could use the Count API
    and multiply the count by average doc size.

-Zach

On Sunday, February 10, 2013 9:20:24 AM UTC-5, Elastic Noob wrote:

Hi good day,

I was wondering how can i calculate the amount of disk space used by
Elastic Search?

Here's an example:

I collected 1 million tweets, using the index "tweets" and I indexed the
"text" and "users" key.

So my 2 questions are:

  1. In this situation, how do i find out the amount of disk space used
    the the "tweets" index?
  2. Is there any way I can find out the amount of diskspace used by
    tweets that contain the hashtag "#YOLO" for example?

Thanks everyone!

Best.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Elastic Noob wrote:

i was wondering if the size of the index reflect the size of the
total amount of disk space used by the dataset? ( including the raw
_source data ) ?

You want the total -> store -> size(_in_bytes). It should reflect
the total usage of your index's shards across all disks (primaries +
replicas).

Here's a simple way to get this number without looking through the
json.

% curl -s download.elasticsearch.org/es2unix/es >~/bin/es; chmod +x ~/bin/es
% es indices -v wik
status name pri rep size bytes docs
green wiki 5 1 5.3gb 5731796207 753816

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

WOW.

cool!

Thanks!

On Friday, February 15, 2013 10:56:30 PM UTC+8, Drew Raines wrote:

Elastic Noob wrote:

i was wondering if the size of the index reflect the size of the
total amount of disk space used by the dataset? ( including the raw
_source data ) ?

You want the total -> store -> size(_in_bytes). It should reflect
the total usage of your index's shards across all disks (primaries +
replicas).

Here's a simple way to get this number without looking through the
json.

% curl -s download.elasticsearch.org/es2unix/es >~/bin/es; chmod +x
~/bin/es
% es indices -v wik
status name pri rep size bytes docs
green wiki 5 1 5.3gb 5731796207 753816

-Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.