Ad hoc query regex (this time with sample data)

Here is all the consolidated information from the many splintered threads I've posted over the last week or two:

Here are my software versions:

  • logstash-5.2.2-1.noarch
  • elasticsearch-5.2.1-1.noarch
  • kibana-5.2.1-1.x86_64

Here is my as-basic-as-I-can-make-it Logstash config:

input {
file {
path => ["/var/local/test-logs/alb/alb-core.log"]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter { grok { match => { "message" => "%{GREEDYDATA}" } } }
output { elasticsearch { hosts => ["localhost:9200"] } }

Here is the single line in the input file:

Feb 24 03:48:11 myServer alb-core: 2017-02-24 03:48:02;149 INFO T[pool-32-thread-1] net.myproject.api.messaging.RedisService: Redis Service Message Received - Host: dummy.server.com:6379 Channel: bigbluebutton:meeting:participants Message: {"timestamp":"1487908082148","externalUserId":"1234567890@foo","internalUserId":"1234567890@foo","meetingId":"ea02e4418fd0709572417991578c281913f2085c296486c0c1d40f284fd33d9c-1487905970320","guest":"false","role":"MODERATOR","messageId":"UserJoinedEvent","fullname":"Doe, John"}

Here are the Analyzers I have tried by going into Kibana Management - Advanced Settings and editing query:queryString:options:

  • Standard
  • Simple
  • Whitespace
  • English

Here is my first problem:

  1. Super basic regex queries flat out don't work, regardless of analyzer.

This works:

+"meetingId"

This returns 0 results

+/meetingId/

I have no idea how to make this any more simple to isolate the problem.

Could you try using /.*meetingId.*/ to search for it and check if that reviles any results?

/.*meetingId.*/

Returns 0 results

The field value seem rather long. I think it might be over the default length of for ignore_above and thus not indexed for search. If you try to put a shorter document in that (usually below 256 chars), would that be found containing that string?

If so, you should most likely adjust the ignore_above value for that field in that index via the mapping for that index, if you know it will contain long values.

I did a reindex of .kibana to tmp, then I deleted the .kibana index, then used the mappings API to change all ignore_above from 256 to 2048, then checked the tmp index to verify the change, then I reindexed tmp back to .kibana, then checked my new .kibana index, and the values are back to 256!

Reindex forces ignore_above back to 256. That can't possibly be by design can it?

OK I cut down my single input line to this

"internalUserId":"1234567890@foo","meetingId":"ea02e4418fd0709572417991578c281913f2085c296486c0c1d40f284fd33d9c-1487905970320","guest":"false","role":"MODERATOR","messageId":"UserJoinedEvent","fullname":"Doe, John"

215 characters total. The basic regex still returns zero results:

+/meetingId/

OK I isolated the problem:

1.) I took out literally everything from the log message but this

meeting

Both queries work for this log message:

  • "meeting"
  • /meeting/

2.) I changed the log message to

meetingI

This query does NOT work:

  • /meetingI/

This query DOES work:

  • /meeting./

Something about the capital letter I is screwing up regex query.

3.) I put the log line back to a medium length

"internalUserId":"1234567890@foo","meetingId":"ea02e4418fd0709572417991578c281913f2085c296486c0c1d40f284fd33d9c-1487905970320","guest":"false","role":"MODERATOR","messageId":"UserJoinedEvent","fullname":"Doe, John"

This search works:

  • /meeting.d/

This search does NOT work:

  • /meetingId/

The analyzer lowercases everything!

This works:

  • meetingid

Hey Brandon,

yeah if you always applied the Standard analyzer (that includes the lower case token filter) it will be lowercased.

You can check if you have a message.keyword or message.raw field with the same name in the index, that would contain the unanalyzed (and thus not lower cased) value in the field for you to regex on.

Tim

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.