How to get Elasticsearch boolean match working for multiple fields


(Dominic Nicholas) #1

Hi,

I need some expert guidance on trying to get a bool match working. I'd like
the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run the
query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly match
'/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not analyzed
:

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not
be analyzed too (based on some training documentation I read recently from
elasticsearch). Here is a snippet of the output _mapping for this index
below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jason Wee) #2

what es version is that?

On Fri, May 8, 2015 at 9:07 AM, Dominic Nicholas <
dominic.s.nicholas@gmail.com> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly match
'/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not
be analyzed too (based on some training documentation I read recently from
elasticsearch). Here is a snippet of the output _mapping for this index
below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dominic Nicholas) #3

Hi - version 1.5.0 of es, 4.10.4 of lucene.

Dom

On Thu, May 7, 2015 at 11:24 PM, Jason Wee peichieh@gmail.com wrote:

what es version is that?

On Fri, May 8, 2015 at 9:07 AM, Dominic Nicholas <
dominic.s.nicholas@gmail.com> wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly
match '/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will
not be analyzed too (based on some training documentation I read recently
from elasticsearch). Here is a snippet of the output _mapping for this
index below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHO4itwspZ96axDfyoLavndj2wzS_%2BV-UJha%2B893F5nzp%3DZYPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzijYCLYR8EmCvfCF6Y2%2BBxqXGrzQTcYSOc4jHnYM2BQ-pAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Allan Mitchell) #4

Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"mytesttype" : {
"_source" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string", "index" : "analyzed" },
"path" : {"type": "string", "index": "analyzed"
}
}
}
}
}

POST /testingindex/mytesttype/1
{
"message": "Failed password for some user or another",
"path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
"message": "Not the right message but the right path",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
"message": "Failed password for some user or another",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
"message": "Nothing is right here",
"path":"/wrong/path/too"
}

GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase" : { "message" : "Failed password for some" }
},
{ "match_phrase" : { "path" : "/var/log/secure" } }

        ]
    }
}

}

On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly match
'/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will not
be analyzed too (based on some training documentation I read recently from
elasticsearch). Here is a snippet of the output _mapping for this index
below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dominic Nicholas) #5

Hi Alan, I really appreciate the thoughtful response. One comment before I
try what you are suggesting... Our path and message fields mappings
indicate not_analyzed, and we don't want to change them at this point.
Someone suggested using the .raw versions of the fields (path.raw and
message.raw, which does work. However, it leaves me with the question : If
the original field mappings indicate the fields are not_analyzed, why is it
necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell casfanallan@gmail.com
wrote:

Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"mytesttype" : {
"_source" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string", "index" : "analyzed" },
"path" : {"type": "string", "index": "analyzed"
}
}
}
}
}

POST /testingindex/mytesttype/1
{
"message": "Failed password for some user or another",
"path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
"message": "Not the right message but the right path",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
"message": "Failed password for some user or another",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
"message": "Nothing is right here",
"path":"/wrong/path/too"
}

GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase" : { "message" : "Failed password for some"
} },
{ "match_phrase" : { "path" : "/var/log/secure" } }

        ]
    }
}

}

On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly
match '/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will
not be analyzed too (based on some training documentation I read recently
from elasticsearch). Here is a snippet of the output _mapping for this
index below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Allan Mitchell) #6

Dominic

Normal nomenclature is that "Field" is analyzed and "Field.raw" is not
analyzed. Not sure why you would have both as not analyzed given they
would do the same thing, all else being equal

When performing your original query above on fields I know are not_analyzed
I get no results because there are no strings in the fields that match
those terms exactly.

I could of course look to do a regex query

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [

         {  "regexp" : { "message" : ".*Failed password for.*" } },
         {  "regexp" : { "path" : ".*/var/log/secure.*" } }

        ]
    }
}

}

On 8 May 2015 at 15:03, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi Alan, I really appreciate the thoughtful response. One comment before
I try what you are suggesting... Our path and message fields mappings
indicate not_analyzed, and we don't want to change them at this point.
Someone suggested using the .raw versions of the fields (path.raw and
message.raw, which does work. However, it leaves me with the question : If
the original field mappings indicate the fields are not_analyzed, why is it
necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell casfanallan@gmail.com
wrote:

Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"mytesttype" : {
"_source" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string", "index" : "analyzed" },
"path" : {"type": "string", "index": "analyzed"
}
}
}
}
}

POST /testingindex/mytesttype/1
{
"message": "Failed password for some user or another",
"path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
"message": "Not the right message but the right path",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
"message": "Failed password for some user or another",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
"message": "Nothing is right here",
"path":"/wrong/path/too"
}

GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase" : { "message" : "Failed password for some"
} },
{ "match_phrase" : { "path" : "/var/log/secure" } }

        ]
    }
}

}

On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly
match '/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will
not be analyzed too (based on some training documentation I read recently
from elasticsearch). Here is a snippet of the output _mapping for this
index below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAECdJzDvkBP7a8pqjKHoF6wKrrTqdCd0a%3DCTU4inJnuM3FCxxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dominic Nicholas) #7

Hi - thanks again - I was misunderstanding the following :

"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}

This is saying that the path is analyzed (default analyzer, and no 'index:
not_analyzed'), but that the field 'raw' is not analyzed. One solution for
me will be to simply use the path.raw field instead of the path field. I'll
also try the regexp. Thanks again for the help!
Dom

On Fri, May 8, 2015 at 10:35 AM, Allan Mitchell casfanallan@gmail.com
wrote:

Dominic

Normal nomenclature is that "Field" is analyzed and "Field.raw" is not
analyzed. Not sure why you would have both as not analyzed given they
would do the same thing, all else being equal

When performing your original query above on fields I know are
not_analyzed I get no results because there are no strings in the fields
that match those terms exactly.

I could of course look to do a regex query

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [

         {  "regexp" : { "message" : ".*Failed password for.*" } },
         {  "regexp" : { "path" : ".*/var/log/secure.*" } }

        ]
    }
}

}

On 8 May 2015 at 15:03, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi Alan, I really appreciate the thoughtful response. One comment before
I try what you are suggesting... Our path and message fields mappings
indicate not_analyzed, and we don't want to change them at this point.
Someone suggested using the .raw versions of the fields (path.raw and
message.raw, which does work. However, it leaves me with the question : If
the original field mappings indicate the fields are not_analyzed, why is it
necessary to use the .raw version ?
Cheers
Dom

On Fri, May 8, 2015 at 6:37 AM, Allan Mitchell casfanallan@gmail.com
wrote:

Hi

Have a look at the below and see if it is what you want.

DELETE /testingindex

PUT /testingindex
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"mytesttype" : {
"_source" : { "enabled" : false },
"properties" : {
"message" : { "type" : "string", "index" : "analyzed" },
"path" : {"type": "string", "index": "analyzed"
}
}
}
}
}

POST /testingindex/mytesttype/1
{
"message": "Failed password for some user or another",
"path":"/wrong/path/"
}
POST /testingindex/mytesttype/2
{
"message": "Not the right message but the right path",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/3
{
"message": "Failed password for some user or another",
"path":"/var/log/secure"
}
POST /testingindex/mytesttype/4
{
"message": "Nothing is right here",
"path":"/wrong/path/too"
}

GET /testingindex/mytesttype/_search

GET /testingindex/mytesttype/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase" : { "message" : "Failed password for
some" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }

        ]
    }
}

}

On 8 May 2015 at 02:07, Dominic Nicholas dominic.s.nicholas@gmail.com
wrote:

Hi,

I need some expert guidance on trying to get a bool match working. I'd
like the query to only return a successful search result if both 'message'
matches 'Failed password for', and 'path' matches '/var/log/secure'.

This is my query :

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '

Here is the start of the output from the search :

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...

The problem is if I change '/var/log/secure' to just 'var' say, and run
the query, I still get a result, just with a lower score. I understood the
bool...must construct meant both match terms here would need to be
successful. What I'm after is no result if 'path' doesn't exactly
match '/var/log/secure'...

{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...

I checked the mappings for these fields to check that they are not
analyzed :

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

I think these fields are non analyzed and so I believe the search will
not be analyzed too (based on some training documentation I read recently
from elasticsearch). Here is a snippet of the output _mapping for this
index below.

  ....
  "message" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  "path" : {
    "type" : "string",
    "norms" : {
      "enabled" : false
    },
    "fields" : {
      "raw" : {
        "type" : "string",
        "index" : "not_analyzed",
        "ignore_above" : 256
      }
    }
  },
  ....

Where am I going wrong (in a bunch of places I'm sure), what am I
misunderstanding here (probably a lot!) ?

Any help would be much appreciated!

Thanks

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0470f9df-8d9a-48ef-9dbd-a90c8f2db194%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAECdJzAWt8u-tNq3vGz333XTauLJN_4pJm22uLpn6O7KE%2Bbjng%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CA%2BHzija-TWSs47WAkjpaSTaNysXSer0a12Nza2Y5CaXi6646GQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uowU5uSn6tE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAECdJzDvkBP7a8pqjKHoF6wKrrTqdCd0a%3DCTU4inJnuM3FCxxg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAECdJzDvkBP7a8pqjKHoF6wKrrTqdCd0a%3DCTU4inJnuM3FCxxg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Please update your bookmarks! We moved to https://discuss.elastic.co/

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BHzijY0jNUjrdkyjqqOWRA2RNf0vdKMSuMsXMb4eTdwDAXZfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #8