This is a typical log line from Apache being stored in AWS Elasticsearch. I'd like to be able to add a viz to my dashboard showing top referrers. The problem is that many static files have referrers from its own domain which prevents me from seeing the data I want.
Is it possible to have a search expression like "where REFERRER does not contain VHOST"
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET / HTTP/1.1" 200 42766 "http://facebook.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0
123.456.78.9 - - [15/Feb/2017:18:33:25 +0000] example.com "GET /js/lib/jquery-ui/jquery-ui.js HTTP/1.1" 200 42766 "http://example.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_0_2 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) Version/10.0 Mobile/14A456 Safari/602.1" Server=aws8 SSL=- 8868 0
This is obviously a custom log, here is them mapping.
You could create a boolean scripted field for the index which would be true if VHOST is in REFERRER and then use that to filter those out in Discover, Visualize, etc.
Select painless as the script language, boolean as the type and a script something like;
Although you can use regular expressions in painless scripted fields, they are disabled in Elasticsearch by default because of potential performance implications. You should instead try to use something like ; if (doc['referrer'].value.indexOf(doc['vhost']) ) > 0; return true; else return false
(that's probably not exactly right, but you basically want to check if the index of vhost in referrer is > 0)
Can you check the Fields tab of your index pattern and look at your referrer and vhost fields. Are they both searchable and aggregatable? I think they need to be in order to use them in a scripted field.
Do you have the language set to painless and the type set to boolean?
Does every document in this index have a value for both of those fields? You could use the Discover tab to check by using the * icon on those fields;
and then negate that exists filter by clicking the magnifying glass with the - in it;
If there are docs where either of those fields don't have a value, you'll have to use a little bit more complex script fir the scripted field. I can try to find a good example of that if you find that's the case.
An intermediate step you could try to help debug your scripted field would be to make another scripted field of type number and just try to find the index of vhost in referrer like this;
doc['referrer'].value.indexOf(doc['vhost'].value)
Then check in Discover if you're getting that numeric value. You'll have to delete the existing scripted field so that you don't get the shards failed error on it.
Neither field is set to aggregatable, both are set to searchable. How do I fix that? I do have fields called vhost.keyword and referrer.keyword which are both set to aggregatable and searchable.
Yes on painless/boolean.
Yes on values in both fields. Apache puts a dash "-" in the log line when there is no data. I checked as you specified too.
The last test with "doc['referrer'].value.indexOf(doc['vhost'].value)" as a new script with number as the type gave the same error "Courier Fetch: 5 of 5 shards failed."
I have no errors which is a step in the right direction! BUT I can't seem to use that field in Discover. I've called the new scripted field vhostIsReferrer. I've tried the following in Discover and Visualizer:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.