Looking for good tutorial on setting up a basic index template to allow faceting of syslog files

RLeyba · May 14, 2014, 12:11pm

Hi Team,

I am new to elasticsearch and am looking for guidance on how to do
faceting on our fairly large (1.2 billion syslog records per month) log
file collection, which we are currently loading into ES. We just need to
keep 3 months worth of logs (maximum 6 billiion records). My schema for
each line of syslog is just a timestamp, host-IPaddress and Message
field.. But I definitely need to do reporting (ranking) of busiest
host-IP and top 20 or even 50 log messages. Understanding that the ranking
can sum up into the hundreds of millions (and I only have a 48GB RAM
server), I have read that there is a way to do this off heap
(http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/).

I have tried following the instructions here,
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html,
but am looking for more examples, and initial setup, especially for our ES
setup which is basically new and I have not setup any mappings yet.

Any advice or recommended links will be helpful.

Thanks

--rleyba

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/957ae8d4-7a04-4263-b9ad-6e06cdb405da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ElasticSearch_Users_ · May 14, 2014, 1:36pm

You could start with a mapping like this and then do some testing on a
small subset of data to get a feel for how much heap it is using up when
you run your report queries (using the _cat/nodes API). From there, you
should be able to determine how many nodes + how much RAM per node you will
need.

{
"mappings": {
"log": {
"_all": { "enabled" : false },
"properties": {
"timestamp": {
"type": "date"
},
"ip": {
"type": "ip"
},
"message": {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed",
"ignore_above" : 256}
}
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/817c9135-6c33-4d3d-970a-afdc509cfd12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RLeyba · May 16, 2014, 11:37am

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67ab2276-404b-4de8-aad6-a6b28d89dbae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RLeyba · May 16, 2014, 11:37am

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7e78d62e-7c57-4cc2-add7-9cf689d88e8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

RLeyba · May 16, 2014, 12:44pm

Thanks Binh, let me read up on that API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f9b26f0f-37fc-462e-8b41-caeb076f38a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Setup of a cluster Elasticsearch	3	323	July 6, 2017
Difficulty trying to find a way to use facets Elasticsearch	8	400	July 6, 2017
Facet cache size and other memory metrics Elasticsearch	3	354	July 6, 2017
Elasticsearch JVM memory not released after running facet browsing Elasticsearch	2	430	July 6, 2017
How to facet on size? Elasticsearch	7	425	July 6, 2017

Looking for good tutorial on setting up a basic index template to allow faceting of syslog files

Related topics