Failure to get results when querying specific field


(Oded Ben-Ozer) #1

I'm trying to understand a search issue with Graylog2 http://graylog2.org/
.
This is an example of how Graylog store data(taken from a successful query)
:
{
"_id": "Z5y6mxR-QBejYdDI07Ax3A",
"_index": "graylog2",
"_score": 1.4142135,
"_source": {
"_Comp": "app",
"_Env": "production",
"_Short_path": "some_file.log",
"created_at": 1348465507.609,
"facility": "logstash-gelf",
"file":
"file:/usr/local/logstash/logstash-1.1.1-monolithic.jar!/logstash/outputs/gelf.rb",
"full_message": "Stacktrace:\norg.apache.jasper.JasperException:
Exception in JSP: /jsp/mobile/some_file.jsp:31",
"host": "some_host",
"level": 7,
"line": 138,
"message": "Stacktrace:\norg.apache.jasper.JasperException: Exception
in JSP: /jsp/mobile/some_file.jsp:31",
"streams": [
"50558899fb7f611830000019"
]
},
"_type": "message"
}

This is how Graylog2 tries(and fails) to search for data when I search
for JasperException (its a full text search as this string is
not separated by whitespace )

{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "message:JasperException"
}
},
{
"range": {
"created_at": {
"gt": 1348465507,
"lt": 1348465508
}
}
}
]
}
},
"size": 5
}'

But if I change the query_string from "query": "message
:JasperException" to "query": "_all:JasperException" it works.
As the substring 'JasperException' is clearly present in the message field
I don't understand why the query graylog uses doesn't work.
Can anybody shed some light on this ?

--


(Ivan Brusic) #2

Can you supply (gist) your mapping? Never used graylog, but
perhaps allow_leading_wildcard is set to false for the message field, but
not for the _all field (which I also do not use). It might also not be
analyzed.

--
Ivan

On Thu, Sep 27, 2012 at 8:42 AM, Oded Ben-Ozer oded.benozer@gmail.comwrote:

I'm trying to understand a search issue with Graylog2http://graylog2.org/
.
This is an example of how Graylog store data(taken from a successful
query) :
{
"_id": "Z5y6mxR-QBejYdDI07Ax3A",
"_index": "graylog2",
"_score": 1.4142135,
"_source": {
"_Comp": "app",
"_Env": "production",
"_Short_path": "some_file.log",
"created_at": 1348465507.609,
"facility": "logstash-gelf",
"file":
"file:/usr/local/logstash/logstash-1.1.1-monolithic.jar!/logstash/outputs/gelf.rb",
"full_message": "Stacktrace:\norg.apache.jasper.JasperException:
Exception in JSP: /jsp/mobile/some_file.jsp:31",
"host": "some_host",
"level": 7,
"line": 138,
"message": "Stacktrace:\norg.apache.jasper.JasperException: Exception
in JSP: /jsp/mobile/some_file.jsp:31",
"streams": [
"50558899fb7f611830000019"
]
},
"_type": "message"
}

This is how Graylog2 tries(and fails) to search for data when I search
for JasperException (its a full text search as this string is
not separated by whitespace )

{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "message:JasperException"
}
},
{
"range": {
"created_at": {
"gt": 1348465507,
"lt": 1348465508
}
}
}
]
}
},
"size": 5
}'

But if I change the query_string from "query": "message
:JasperException" to "query": "_all:JasperException" it works.
As the substring 'JasperException' is clearly present in the message field
I don't understand why the query graylog uses doesn't work.
Can anybody shed some light on this ?

--

--


(Oded Ben-Ozer) #3

So, this is the mapping more or less (I removed some of the
"properties" entries )

curl -XGET 'http://localhost:9200/graylog2/_mapping?pretty'
{
"graylog2" : {
"message" : {
"dynamic_templates" : [ {
"store_generic" : {
"mapping" : {
"index" : "not_analyzed"
},
"match" : "*"
}
} ],
"properties" : {
"_Comp" : {
"type" : "string",
"index" : "not_analyzed"
},
"_ZONE" : {
"type" : "string",
"index" : "not_analyzed"
},
"_message" : {
"type" : "string",
"index" : "not_analyzed"
},
"_timestampMs" : {
"type" : "string",
"index" : "not_analyzed"
},
"_verb" : {
"type" : "string",
"index" : "not_analyzed"
},
"created_at" : {
"type" : "double",
"ignore_malformed" : false
},

    "full_message" : {
      "type" : "string",
      "analyzer" : "whitespace"
    },
    "message" : {
      "type" : "string",
      "analyzer" : "whitespace"
    },

  }

And some more info :
both queries(with 'message' and '_all' ) take the same amount of time ,> 5
minutes , in which time the CPU of the data nodes is very busy.
Its almost 1TB of text and the indexes are used because I'm using a leading
wildcard and that field is analyzed using whitespace and I'm searching for
pattern inside a long 'word'.

On Thu, Sep 27, 2012 at 6:30 PM, Ivan Brusic ivan@brusic.com wrote:

Can you supply (gist) your mapping? Never used graylog, but
perhaps allow_leading_wildcard is set to false for the message field, but
not for the _all field (which I also do not use). It might also not be
analyzed.

--
Ivan

On Thu, Sep 27, 2012 at 8:42 AM, Oded Ben-Ozer oded.benozer@gmail.comwrote:

I'm trying to understand a search issue with Graylog2http://graylog2.org/
.
This is an example of how Graylog store data(taken from a successful
query) :
{
"_id": "Z5y6mxR-QBejYdDI07Ax3A",
"_index": "graylog2",
"_score": 1.4142135,
"_source": {
"_Comp": "app",
"_Env": "production",
"_Short_path": "some_file.log",
"created_at": 1348465507.609,
"facility": "logstash-gelf",
"file":
"file:/usr/local/logstash/logstash-1.1.1-monolithic.jar!/logstash/outputs/gelf.rb",
"full_message": "Stacktrace:\norg.apache.jasper.JasperException:
Exception in JSP: /jsp/mobile/some_file.jsp:31",
"host": "some_host",
"level": 7,
"line": 138,
"message": "Stacktrace:\norg.apache.jasper.JasperException: Exception
in JSP: /jsp/mobile/some_file.jsp:31",
"streams": [
"50558899fb7f611830000019"
]
},
"_type": "message"
}

This is how Graylog2 tries(and fails) to search for data when I search
for JasperException (its a full text search as this string is
not separated by whitespace )

{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "message:JasperException"
}
},
{
"range": {
"created_at": {
"gt": 1348465507,
"lt": 1348465508
}
}
}
]
}
},
"size": 5
}'

But if I change the query_string from "query": "message
:JasperException" to "query": "_all:JasperException" it works.
As the substring 'JasperException' is clearly present in the message
field I don't understand why the query graylog uses doesn't work.
Can anybody shed some light on this ?

--

--

--


(Ivan Brusic) #4

The whitespace analyzer will not lowercase tokens, while the query_string
parser will. Perhaps setting lowercase_expanded_terms to false might help.

--
Ivan

On Thu, Sep 27, 2012 at 10:35 AM, Oded Ben-Ozer oded.benozer@gmail.comwrote:

So, this is the mapping more or less (I removed some of the
"properties" entries )

curl -XGET 'http://localhost:9200/graylog2/_mapping?pretty'
{
"graylog2" : {
"message" : {
"dynamic_templates" : [ {
"store_generic" : {
"mapping" : {
"index" : "not_analyzed"
},
"match" : "*"
}
} ],
"properties" : {
"_Comp" : {
"type" : "string",
"index" : "not_analyzed"
},
"_ZONE" : {
"type" : "string",
"index" : "not_analyzed"
},
"_message" : {
"type" : "string",
"index" : "not_analyzed"
},
"_timestampMs" : {
"type" : "string",
"index" : "not_analyzed"
},
"_verb" : {
"type" : "string",
"index" : "not_analyzed"
},
"created_at" : {
"type" : "double",
"ignore_malformed" : false
},

    "full_message" : {
      "type" : "string",
      "analyzer" : "whitespace"
    },
    "message" : {
      "type" : "string",
      "analyzer" : "whitespace"
    },

  }

And some more info :
both queries(with 'message' and '_all' ) take the same amount of time ,> 5
minutes , in which time the CPU of the data nodes is very busy.
Its almost 1TB of text and the indexes are used because I'm using a
leading wildcard and that field is analyzed using whitespace and I'm
searching for pattern inside a long 'word'.

On Thu, Sep 27, 2012 at 6:30 PM, Ivan Brusic ivan@brusic.com wrote:

Can you supply (gist) your mapping? Never used graylog, but
perhaps allow_leading_wildcard is set to false for the message field, but
not for the _all field (which I also do not use). It might also not be
analyzed.

--
Ivan

On Thu, Sep 27, 2012 at 8:42 AM, Oded Ben-Ozer oded.benozer@gmail.comwrote:

I'm trying to understand a search issue with Graylog2http://graylog2.org/
.
This is an example of how Graylog store data(taken from a successful
query) :
{
"_id": "Z5y6mxR-QBejYdDI07Ax3A",
"_index": "graylog2",
"_score": 1.4142135,
"_source": {
"_Comp": "app",
"_Env": "production",
"_Short_path": "some_file.log",
"created_at": 1348465507.609,
"facility": "logstash-gelf",
"file":
"file:/usr/local/logstash/logstash-1.1.1-monolithic.jar!/logstash/outputs/gelf.rb",
"full_message": "Stacktrace:\norg.apache.jasper.JasperException:
Exception in JSP: /jsp/mobile/some_file.jsp:31",
"host": "some_host",
"level": 7,
"line": 138,
"message": "Stacktrace:\norg.apache.jasper.JasperException:
Exception in JSP: /jsp/mobile/some_file.jsp:31",
"streams": [
"50558899fb7f611830000019"
]
},
"_type": "message"
}

This is how Graylog2 tries(and fails) to search for data when I search
for JasperException (its a full text search as this string is
not separated by whitespace )

{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "message:JasperException"
}
},
{
"range": {
"created_at": {
"gt": 1348465507,
"lt": 1348465508
}
}
}
]
}
},
"size": 5
}'

But if I change the query_string from "query": "message
:JasperException" to "query": "_all:JasperException" it works.
As the substring 'JasperException' is clearly present in the message
field I don't understand why the query graylog uses doesn't work.
Can anybody shed some light on this ?

--

--

--

--


(system) #5