Hello,
On Mon, Jul 1, 2013 at 2:48 PM, lijionly@gmail.com wrote:
Hi,
I'm in a project to use elasticsearch storing log files by lines. I have
read es doc for weeks and have some concerns:
- is there a limitation and check on index size? Because the disk size is
limited to about 20G, we plan to delete some indcies when disk is full.
Does es support to detect the size limit?
I'm not aware of such a feature built into ES.
- what's the adventage if we create index based on date, like one index
per day which makes too many indecies. Because we can also create just 1
index. Will they differed in performance when data gets bigger?
Usually, having time-based indices makes sense for logs, because you're
most often looking at the newest ones. Having one index per time interval
(like a day), should make most searches faster, because ES will be looking
in a smaller chunk of data. Also, indexing should be faster, because ES
will have less data to manage when indexing (for things like merging).
On the other hand, more indices will mean more shards, which imply a memory
overhead. And when you're searching in all data, ES will have more shards
to aggregate results from, so you can expect those searches to be slower.
In the end, it's a trade-off. For most situations, it's worth having
time-based indices, as long as you don't end up with a gazillion indices.
For example, if you hold data for a year, you might be better off with
weekly/monthly indices than daily.
- for _source field, if set to false, then raw data will not be stored in
es, just store some index? so if es nodes restarted for some reason, all
the datas will be lost?
Not just if ES is restarted, not storing _source means you can only search
for logs, without actually getting their content. Unless you specifically
store various fields, in which case you can retrieve their content.
If you need ES to give your original logs back when searching, you'd have
to leave _source enabled. It's worth disabling it when you have your logs
in a separate place where you can retrieve them by ID. This way you can use
ES only for searching, and do a second call to your other data store to get
the actual logs. This would be slower and more difficult to manage than
having everything in ES, but in some use-cases it might be useful.
Best regards,
Radu
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.