Hit a log only all specific keyword exists?


(Hacksign) #1

I have a search statement like this:

{
    "size" : 10,
    "query" : {
        "bool" : {
            "must" : [
                {
                    "match" : {
                        "log._kv_" : {
                            "query" : "officecheck macros vba_code cmd.exe",
                            "operator" : "and"
                        }
                    }
                }
            ]
        }
    },
    "highlight" : {
        "fields" : {
            "log._kv_" : {}
        }
    }
}

I want pick out a log which contain all keywords I mentioned. But I found the result is not what I'm expecting ....Some of the keyword doesn't exists in highlight part in result(cmd not exists in below record) :

      "highlight" : {
        "log._kv_" : ["additional_info <em>officecheck</em> ole <em>macros</em> <em>vba</em>_<em>code</em> Public Sub Main()\nConst ProcName As String = \"\"\nOn", "additional_info <em>officecheck</em> ole <em>macros</em> subfilename r:\\sav6\\work_channel0_9\\11793486", "additional_info <em>officecheck</em> ole <em>macros</em> <em>vba</em>_filename Module1.bas", "additional_info <em>officecheck</em> ole <em>macros</em> stream_path _VBA_PROJECT_CUR/VBA/Module1" ]
      }

So, anyone tell me how can I write a statement to pick out the log only when all keywords exists in a specific field ?


(Christoph) #2

Hi,
I think the query looks okay, but with the default highlighter settings you only see a limited number of fragments with 100 characters size. Have you tried specifying different values for fragment_size and number_of_fragments as described in the Highlighting-API?


(Hacksign) #3

Thanks~this is the solution.
BTW, Is there any length restrict of a string type field ? my log string could be a verylong string ~


(Christoph) #4

This thread suggests a theoretical limit on the maximum document size in Lucene of 2GB, also the size of a single token seems to be limited to approx. 32kB.
Still, it might be useful to split very large documents in logical subsections (e.g. a book into chapters or pages). What sizes are we talking about?


(Hacksign) #5

32kb * 1024 = 32768 chracters of each field ?
that seems enought for my situation.
I'm pretty sure a single document will not over 2GB.

Thanks for your answer !


(Christoph) #6

To clarify, the 32kB is a limit on the length of each term in a field, so if your field it analyzed each term (or word, token depending on your terminology) can't be longer than that. However, you can still have as many terms in that field as you like. This might only be a restriction if your field is not analyzed.


(system) #7