Using multiple DATA and/or GREEDYDATA patterns can be slow and give incorrect results. It looks like most of your DATA patterns (except for the timestamp field) could be replaced with the more selecting NOTSPACE pattern. To get the url as well as the component, capture the full url in your current pattern, and then apply another grok or dissect filter just on this url field to extract the components.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.