Regex based query help


(Subbu v) #1

Hi All,

Please advise on how to construct query on the below scenario. i tried multiple aggregations and the query is performing very slow.

  1. i have 1000s of log file and i am using logstash to index data to elastic search.
  2. each log file may contain more than 5000 lines
  3. in my search application , i should allow the user to enter some regex based expression to search with in the file (like .*.+([0-9][0-9]:[0-9][0-9]:[0-9][0-9] [A-Z][a-z]{1,2} [0-9][0-9] 20[0-9][0-9]).+.) , file name (like ABC) and a date range.
  4. on the first hand , i dont have to display the results with in the file names, instead i just have to display the matching file name.
  5. when i try with regular boolean queries and aggregations, i am able to achieve the results. but the performance is very slow.
  6. so i am thinking , if there is a way i can tell elastic search to get only the first occurrence of the regex per file (for the purpose of getting only the file names)? i tried to achieve this in aggregation, but the aggregation is applied on the result set received. this is again impact on performance.

the files are not analyzed.

please help.
thanks


(Christian Dahlqvist) #2

Searching with regular expressions on large not_analyzed fields like you are describing is not what Elasticsearch was designed to do. If you are not going to leverage any of the search capabilities of Elasticsearch, why use it at all?

Using Elasticsearch for Log Analytics is very common, but generally files are broken up into individual events and parse out relevant information and enrich these before indexing them into Elasticsearch rather than inserted whole. This makes it possible to search and aggregate across the log events without using any regular expressions, e.g. using Kibana.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.