Small Scale elasticsearch implementation


(Erica) #1

I am currently using logstash and elasticsearch to parse just one log file
(it would not be uncommon for the file to be 1gb). A 15mb file is taking 2
minutes to parse out with this configuration I have posted below (I have
also tried using no filter, which takes approximately 1 minute to parse). I
have a lot more grok patterns and filtering that I want to implement as
well, which have caused the amount of time it takes to parse to jump to 25
minutes. Aside from changing the grok patterns and such, is there a way to
might things faster? Even for the more basic configuration, if you scale
that 2 minutes to a 1gb file, the amount of time it takes to parse is
extremely high. I am running both logstash and elasticsearch on a machine
with 4gb of RAM and AMD athlon II Dualcore m300 2.00GHz processor.

I am looking to keep things on a small scale implementation, as I only want
to read 1 log file every so often for troubleshooting purposes. Any help
with making logstash/elasticsearch quicker would be much appreciated.

input {

file {

path => "C:\Users\Attendee\Desktop\Log 

Files\2013-10-10\nexus20-ANALYTICS_8080.log.2013-10-10"

        start_position => "beginning"

       

        #if the log line doesn't have a timestamp at the beginning, 

this will merge the line with the previous line (which would have a
timestamp)

        codec => multiline {

  pattern => "^%{TIMESTAMP_ISO8601},"

  negate => true

  what => previous

}

}

}

filter {

grok {

match =>

        [

                    #default

                    "message", "%{TIMESTAMP_ISO8601:time} 

[(%{WORD:soapService},|,)m=(%{WORD:m},|,)x=(%{GREEDYDATA:x},|,)c=(%{DATA:c},|,)i=
(%{GREEDYDATA:i}|)]
%{WORD:loglevel}\s*%{DATA:service}\s*-\s*%{GREEDYDATA:description}"

        ]

}

}

output {

elasticsearch {

        host => localhost 

        embedded => true

}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/491b0a16-7159-4b20-a354-ab118d1766a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

This doesn't appear to be an ES specific issue, but I can see you've cross
posted this to the LS list so I'll reply there :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 2 July 2014 00:40, Erica areckanp@gmail.com wrote:

I am currently using logstash and elasticsearch to parse just one log file
(it would not be uncommon for the file to be 1gb). A 15mb file is taking 2
minutes to parse out with this configuration I have posted below (I have
also tried using no filter, which takes approximately 1 minute to parse). I
have a lot more grok patterns and filtering that I want to implement as
well, which have caused the amount of time it takes to parse to jump to 25
minutes. Aside from changing the grok patterns and such, is there a way to
might things faster? Even for the more basic configuration, if you scale
that 2 minutes to a 1gb file, the amount of time it takes to parse is
extremely high. I am running both logstash and elasticsearch on a machine
with 4gb of RAM and AMD athlon II Dualcore m300 2.00GHz processor.

I am looking to keep things on a small scale implementation, as I only
want to read 1 log file every so often for troubleshooting purposes. Any
help with making logstash/elasticsearch quicker would be much appreciated.

input {

file {

path => "C:\Users\Attendee\Desktop\Log Files\2013-10-10\nexus20-

ANALYTICS_8080.log.2013-10-10"

        start_position => "beginning"



        #if the log line doesn't have a timestamp at the beginning,

this will merge the line with the previous line (which would have a
timestamp)

        codec => multiline {

  pattern => "^%{TIMESTAMP_ISO8601},"

  negate => true

  what => previous

}

}

}

filter {

grok {

match =>

        [

                    #default

                    "message", "%{TIMESTAMP_ISO8601:time}

[(%{WORD:soapService},|,)m=(%{WORD:m},|,)x=(%{GREEDYDATA:x},|,)c=(%{DATA:c},|,)i=
(%{GREEDYDATA:i}|)] %{WORD:loglevel}\s*%{DATA:
service}\s*-\s*%{GREEDYDATA:description}"

        ]

}

}

output {

elasticsearch {

        host => localhost

        embedded => true

}

}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/491b0a16-7159-4b20-a354-ab118d1766a1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/491b0a16-7159-4b20-a354-ab118d1766a1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aOZWyp4TdGVjJ9O7Y5KzKbuN%2BULC10afpcVwWC8jmScw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3