Elasticsearch index size check

Hi,
I'm in a project to use elasticsearch storing log files by lines. I have
read es doc for weeks and have some concerns:

  1. is there a limitation and check on index size? Because the disk size is
    limited to about 20G, we plan to delete some indcies when disk is full.
    Does es support to detect the size limit?
  2. what's the adventage if we create index based on date, like one index
    per day which makes too many indecies. Because we can also create just 1
    index. Will they differed in performance when data gets bigger?
  3. for _source field, if set to false, then raw data will not be stored in
    es, just store some index? so if es nodes restarted for some reason, all
    the datas will be lost?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

On Mon, Jul 1, 2013 at 2:48 PM, lijionly@gmail.com wrote:

Hi,
I'm in a project to use elasticsearch storing log files by lines. I have
read es doc for weeks and have some concerns:

  1. is there a limitation and check on index size? Because the disk size is
    limited to about 20G, we plan to delete some indcies when disk is full.
    Does es support to detect the size limit?

I'm not aware of such a feature built into ES.

  1. what's the adventage if we create index based on date, like one index
    per day which makes too many indecies. Because we can also create just 1
    index. Will they differed in performance when data gets bigger?

Usually, having time-based indices makes sense for logs, because you're
most often looking at the newest ones. Having one index per time interval
(like a day), should make most searches faster, because ES will be looking
in a smaller chunk of data. Also, indexing should be faster, because ES
will have less data to manage when indexing (for things like merging).

On the other hand, more indices will mean more shards, which imply a memory
overhead. And when you're searching in all data, ES will have more shards
to aggregate results from, so you can expect those searches to be slower.

In the end, it's a trade-off. For most situations, it's worth having
time-based indices, as long as you don't end up with a gazillion indices.
For example, if you hold data for a year, you might be better off with
weekly/monthly indices than daily.

  1. for _source field, if set to false, then raw data will not be stored in
    es, just store some index? so if es nodes restarted for some reason, all
    the datas will be lost?

Not just if ES is restarted, not storing _source means you can only search
for logs, without actually getting their content. Unless you specifically
store various fields, in which case you can retrieve their content.

If you need ES to give your original logs back when searching, you'd have
to leave _source enabled. It's worth disabling it when you have your logs
in a separate place where you can retrieve them by ID. This way you can use
ES only for searching, and do a second call to your other data store to get
the actual logs. This would be slower and more difficult to manage than
having everything in ES, but in some use-cases it might be useful.

Best regards,
Radu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you for your kindly explanation.
For the _source, I test it:

  1. create a index with type and doc, then ES will generate the mapping
    automatically. And the _source field is true by default.
  2. I use curl to update the mapping file setting _source field to be false.
  3. get mapping and test with add new doc, the _source is still stored.

Is it a normal way? and _source can not be changed after set?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

Yes, the behavior you're reporting is normal, in the sense that you can't
change the mapping that's already there, you can only extend the mapping.
Otherwise, the existing documents will have to be re-indexed to fit the new
mapping. Which is something you can do manually:

Best regards,
Radu

On Tue, Jul 2, 2013 at 6:31 AM, lijionly@gmail.com wrote:

Thank you for your kindly explanation.
For the _source, I test it:

  1. create a index with type and doc, then ES will generate the mapping
    automatically. And the _source field is true by default.
  2. I use curl to update the mapping file setting _source field to be false.
  3. get mapping and test with add new doc, the _source is still stored.

Is it a normal way? and _source can not be changed after set?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you, I get it.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.