Looking for good tutorial on setting up a basic index template to allow faceting of syslog files

Hi Team,

I am new to elasticsearch and am looking for guidance on how to do
faceting on our fairly large (1.2 billion syslog records per month) log
file collection, which we are currently loading into ES. We just need to
keep 3 months worth of logs (maximum 6 billiion records). My schema for
each line of syslog is just a timestamp, host-IPaddress and Message
field.. But I definitely need to do reporting (ranking) of busiest
host-IP and top 20 or even 50 log messages. Understanding that the ranking
can sum up into the hundreds of millions (and I only have a 48GB RAM
server), I have read that there is a way to do this off heap
(http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/).

I have tried following the instructions here,
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html,
but am looking for more examples, and initial setup, especially for our ES
setup which is basically new and I have not setup any mappings yet.

Any advice or recommended links will be helpful.

Thanks

--rleyba

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/957ae8d4-7a04-4263-b9ad-6e06cdb405da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You could start with a mapping like this and then do some testing on a
small subset of data to get a feel for how much heap it is using up when
you run your report queries (using the _cat/nodes API). From there, you
should be able to determine how many nodes + how much RAM per node you will
need.

{
"mappings": {
"log": {
"_all": { "enabled" : false },
"properties": {
"timestamp": {
"type": "date"
},
"ip": {
"type": "ip"
},
"message": {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed",
"ignore_above" : 256}
}
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/817c9135-6c33-4d3d-970a-afdc509cfd12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67ab2276-404b-4de8-aad6-a6b28d89dbae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e78d62e-7c57-4cc2-add7-9cf689d88e8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f9b26f0f-37fc-462e-8b41-caeb076f38a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.