Processing solr logs using logstash

Hi,

I am using solr which has multiple collections in it. Solr dumps the log of all collections into a single file(solr.log) and I want the logs of one particular collection.

Here is a sample log structure,

INFO - 2015-11-19 10:55:36.389; org.apache.solr.core.SolrCore; [example1_shard1_replica1] webapp=/solr path=/select params={some params} hits=3219 status=0 QTime=361
INFO - 2015-11-19 10:55:37.085; org.apache.solr.core.SolrCore; [example1_shard2_replica1] webapp=/solr path=/select params={some params} status=0 QTime=20
INFO - 2015-11-19 10:55:37.218; org.apache.solr.core.SolrCore; [example2_shard1_replica1] webapp=/solr path=/select params={some params} status=0 QTime=153
INFO - 2015-11-19 10:55:37.310; org.apache.solr.core.SolrCore; [example1_shard1_replica1] webapp=/solr path=/select params={some params} hits=9612 status=0 QTime=1286
INFO - 2015-11-19 10:56:08.597; org.apache.solr.core.SolrCore; [example3_shard1_replica1] webapp=/solr path=/select params={some params} hits=84 status=0 QTime=577
INFO - 2015-11-19 10:56:09.002; org.apache.solr.core.SolrCore; [example2_shard2_replica1] webapp=/solr path=/select params={some params} hits=26 status=0 QTime=318
INFO - 2015-11-19 10:56:09.390; org.apache.solr.core.SolrCore; [example3_shard2_replica1] webapp=/solr path=/select params={some params} status=0 QTime=36
INFO - 2015-11-19 10:56:09.407; org.apache.solr.core.SolrCore; [example1_shard1_replica1] webapp=/solr path=/select params={some params} hits=84 status=0 QTime=727

I want lines where collection contains "example1"

Please let me know any ideas/suggestion

You need to send it to Logstash and then filter out the ones you want

  • Be very careful about using DATA or GREEDYDATA in more than one place. It can create unexpected matches that are non-obvious to debug.
  • You should be able to use the kv filter to parse the key/value data.
  • Use conditionals to apply filters and outputs depending on the contents of fields:
if "example1" in [collection] {
  ...
}

The key is to pick patterns that aren't too wide (i.e. could match too much) but still matches all the possible values. Let's look at some of the strings that you wanted to match with DATA:

  • org.apache.solr.core.SolrCore: Use %{JAVACLASS}.
  • example3_shard1_replica1: I don't know what this string means, but it's a reasonable assumption that it can't contain a closing square bracket so we can use that as a delimiter: (?<collection>[^\]]+).
  • /solr: It's probably safe to assume that there can't be any spaces here (and if there are spaces in the URL they're probably still encoded with %20 or +). So: %{NOTSPACE:webapp}

Actually, the latter part of the string contains a bunch of key=value pairs. Using the kv filter for parsing them should be convenient.

What do you get? Does it extract the collection field properly? So it's the second grok filter that doesn't work?

The grok expression works fine to extract the collection field. I have no idea why it doesn't work for you.

$ echo 'INFO - 2015-11-19 10:55:36.389; org.apache.solr.core.SolrCore; [example1_shard1_replica1] webapp=/solr path=/select params={some params} hits=3219 status=0 QTime=361' | /opt/logstash/bin/logstash -f test.config 
{
       "message" => "INFO - 2015-11-19 10:55:36.389; org.apache.solr.core.SolrCore; [example1_shard1_replica1] webapp=/solr path=/select params={some params} hits=3219 status=0 QTime=361",
      "@version" => "1",
    "@timestamp" => "2015-12-14T08:22:16.151Z",
          "host" => "lnxolofon",
    "collection" => "example1_shard1_replica1"
}
Logstash startup completed
Logstash shutdown completed

Next time, do copy/paste command output. Do not attach screenshots for text that can be copy/pasted.