Exact phrase search query failing

  • deleted -

You should turn on explain on the query and check what is happening.
I have an idea because had similar problem and I think the same can
happen here.
Your search will search "Aug" on "message" field and will search for
every other words in all fields, however I think you wanted to search
for "Aug 12 23:59:58" on message field. So after the first space it
won't search on the specified field because after the space it thinks
that a new expression will come. If you want to search only on the
message field then your query should be : message: "Aug 12 23:59:58"
Explain would tell you if this is what happening, so I suggest to
check your query with it.

Regards,
Tamas

On Aug 16, 12:35 pm, "kranti.vns" kranti....@gmail.com wrote:

Hi,

I am using the below query for search :

QueryBuilder qb =
QueryBuilders.queryString(queryString).defaultOperator(Operator.AND).

allowLeadingWildcard(false).analyzer(getDefaultAnalyzer()).analyzeWildcard( true);

When I pass the query String as "message:Aug 12 23:59:58" I am getting
back the below data set :

2011/08/12 23:59:58 : Aug 12 23:59:58 hallmark-snrt-008 snort[14387]:
[1:402:8] ICMP Destination Unreachable Port Unreachable [Classification:
Misc activity] [Priority: 3] {ICMP} 10.27.32.46 -> 172.17.1.26
2011/08/12 23:59:58 : Aug 12 23:59:58 hallmark-snrt-008 snort[14387]:
[1:402:8] ICMP Destination Unreachable Port Unreachable [Classification:
Misc activity] [Priority: 3] {ICMP} 10.27.32.46 -> 172.17.1.26
2011/08/12 23:58:59 : Aug 12 23:58:59 hallmark-snrt-008 snort[14387]:
[1:384:5] ICMP PING [Classification: Misc activity] [Priority: 3] {ICMP}
192.168.114.189 -> 10.15.1.103
2011/08/12 23:58:59 : Aug 12 23:58:59 hallmark-snrt-008 snort[14387]:
[1:384:5] ICMP PING [Classification: Misc activity] [Priority: 3] {ICMP}
192.168.114.189 -> 10.15.1.103
2011/08/10 19:58:12 : <131>Aug 10 19:58:12 10.72.60.121
APF-3-USER_DEL_FAILEDWD_MTG_WLC_2: *Aug 10 19:59:46.598:
%APF-3-USER_DEL_FAILED: apf_ms.c:5092 Unable to delete username
WHVCMTG...@winndixieus.wd.com for mobile 00:17:23:00:10:9c
2011/08/10 15:21:58 : <13>Aug 10 15:21:58 10.211.194.34 Wed Aug 10 15:21:58
2011 : Auth: Login OK: [WHVCMTG...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-17-2c) Wed Aug 10 15:21:58 2011 :
Auth: Login OK: [WHVCMTG...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-12-ca) Wed Aug 10 15:21:59 2011 :
Auth: Login OK: [WHVCMIA...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-15-ef)
2011/08/10 15:15:59 : <13>Aug 10 15:15:59 10.211.194.34 Wed Aug 10 15:15:58
2011 : Auth: Login OK: [WHFLORL000@WINNDIXIEUS] (from client
10.x.x.x_Network port 29 cli 00-15-70-d3-70-a2) Wed Aug 10 15:15:59 2011 :
Auth: Login OK: [WHVCMIA...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-12-b6) Wed Aug 10 15:15:59 2011 :
Auth: Login OK: [WHVCMTG...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-10-d7) Wed Aug 10 15:15:59 2011 :
Auth: Login OK: [WHVCMTG...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-12-d4) Wed Aug 10 15:15:59 2011 :
Auth: Login OK: [WHVCMTG...@winndixieus.wd.com] (from client
10.x.x.x_Network port 29 cli 00-17-23-00-12-df)
2011/08/10 14:00:45 : <13>Aug 10 14:00:45 10.211.194.34 Wed Aug 10 14:00:44
2011 : Auth: Login OK: [WHFLHDQ000@WINNDIXIEUS] (from client
10.x.x.x_Network port 29 cli 00-15-70-d5-9f-24) Wed Aug 10 14:00:44 2011 :
Auth: Login OK: [WHFLORL000@WINNDIXIEUS] (from client 10.x.x.x_Network port
29 cli 00-15-70-d3-26-58) Wed Aug 10 14:00:44 2011 : Auth: Login OK:
[WHFLMTG000@WINNDIXIEUS] (from client 10.x.x.x_Network port 29 cli
00-15-70-d3-29-59) Wed Aug 10 14:00:44 2011 : Auth: Login OK:
[WHVCMIA...@winndixieus.wd.com] (from client 10.x.x.x_Network port 29 cli
00-17-23-00-12-1d)
2011/08/10 12:59:58 : <131>Aug 10 12:59:58 10.12.60.121
APF-3-USER_DEL_FAILEDWD_JAX_WLC_2: *Aug 10 12:59:58.026:
%APF-3-USER_DEL_FAILED: apf_ms.c:5092 Unable to delete username
WHVCJAX...@winndixieus.wd.com for mobile 00:17:23:00:1c:62
2011/08/10 12:59:58 : <13>Aug 10 12:59:58 10.211.194.34 Wed Aug 10 12:59:58
2011 : Auth: Login OK: [WHFLHAM000@WINNDIXIEUS] (from client
10.x.x.x_Network port 29 cli 00-15-70-d3-22-5f) Wed Aug 10 12:59:58 2011 :
Auth: Login OK: [WHFLORL000@WINNDIXIEUS] (from client 10.x.x.x_Network port
29 cli 00-15-70-d3-2a-97) Wed Aug 10 12:59:58 2011 : Auth: Login OK:
[WHVCHDQ...@winndixieus.wd.com] (from client 10.x.x.x_Network port 29 cli
00-17-23-03-14-2c)

Please note that the query is giving back data like "Aug 10 12:59:58" which
is not set as search string.
Can anyone please suggest why this is happening !!

Thanks

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/exact-phrase-search-q...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

  • deleted -

Can you post the first part what you get in the explanation? Can you
run your query with curl on elasticsearch (through the rest api)?

On Aug 17, 1:37 pm, "kranti.vns" kranti....@gmail.com wrote:

Hi Tamas,

Thanks for the reply.I am escaping the double quotes in the literal string
"Aug 12 23:59:58" so as the quotes too are passed in the query String.For e.
g. if the user is looking for

String str="Aug 12 23:59:58"

then before setting variable str in query String I am escaping the double
quotes so that the String str is now like "Aug 12 23:59:58".This mean that
quotes are part of str now.

Now I am setting the query String as message:str and query for it.But that
too is not helping.

As you suggested I have enabled the explain and it is giving the same
behavior as you explained.

Could it be related to the way indices are analyzed?

Regards
Kranti

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/exact-phrase-search-q...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

  • deleted -

Hi Kranti,

Your score explanation shows me that the query was not sent as an
exact query. Are you sure you put your phrase between " mark?
I think, The result from your example is correct as it gave back 2
result where Aug 12 23:59:58 phrase can be found. It searched for
every field because it was not specified where to search.
I'm not sure, but your explanation seems to be an explanation of a
different query which looks like this: message: Aug 12 23:59:58. It
can be seen from the explanation that the first term is matched with
field message, but every other field is matched with _all field. This
can be if you forget to put between "" your query.

I just created a quick example to show you the difference and to test
your problem:
This is the query and the explanation if you search for an exact
phrase:
{
"query" : {
"query_string" : {
"query" : "message:"Aug 12 23\:59\:58""
}
},
"explain" : true
}

0.28767452 = (MATCH) weight(message:"aug 12 23 59 58" in 0), product
of:
1.0 = queryWeight(message:"aug 12 23 59 58"), product of:
1.5342641 = idf(message: aug=1 12=1 23=1 59=1 58=1)
0.6517783 = queryNorm
0.28767452 = fieldWeight(message:"aug 12 23 59 58" in 0), product
of:
1.0 = tf(phraseFreq=1.0)
1.5342641 = idf(message: aug=1 12=1 23=1 59=1 58=1)
0.1875 = fieldNorm(field=message, doc=0)

And this is if not, which looks similar to your example:
{
"query" : {
"query_string" : {
"query" : "message:Aug 12 23\:59\:58"
}
},
"explain" : true
}

0.24987036 = (MATCH) sum of:
0.042657703 = (MATCH) weight(message:aug in 0), product of:
0.3826651 = queryWeight(message:aug), product of:
0.5945349 = idf(docFreq=2, maxDocs=2)
0.6436378 = queryNorm
0.11147529 = (MATCH) fieldWeight(message:aug in 0), product of:
1.0 = tf(termFreq(message:aug)=1)
0.5945349 = idf(docFreq=2, maxDocs=2)
0.1875 = fieldNorm(field=message, doc=0)
0.035548083 = (MATCH) weight(_all:12 in 0), product of:
0.3826651 = queryWeight(_all:12), product of:
0.5945349 = idf(_all: 12=2)
0.6436378 = queryNorm
0.092896074 = (MATCH) fieldWeight(_all:12 in 0), product of:
1.0 = (MATCH) btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
0.5945349 = idf(_all: 12=2)
0.15625 = fieldNorm(field=_all, doc=0)
0.17166457 = (MATCH) sum of:
0.1005684 = (MATCH) weight(_all:23 in 0), product of:
0.6436378 = queryWeight(_all:23), product of:
1.0 = idf(_all: 23=1)
0.6436378 = queryNorm
0.15625 = (MATCH) fieldWeight(_all:23 in 0), product of:
1.0 = (MATCH) btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
1.0 = idf(_all: 23=1)
0.15625 = fieldNorm(field=_all, doc=0)
0.035548083 = (MATCH) weight(_all:59 in 0), product of:
0.3826651 = queryWeight(_all:59), product of:
0.5945349 = idf(_all: 59=2)
0.6436378 = queryNorm
0.092896074 = (MATCH) fieldWeight(_all:59 in 0), product of:
1.0 = (MATCH) btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
0.5945349 = idf(_all: 59=2)
0.15625 = fieldNorm(field=_all, doc=0)
0.035548083 = (MATCH) weight(_all:58 in 0), product of:
0.3826651 = queryWeight(_all:58), product of:
0.5945349 = idf(_all: 58=2)
0.6436378 = queryNorm
0.092896074 = (MATCH) fieldWeight(_all:58 in 0), product of:
1.0 = (MATCH) btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
0.5945349 = idf(_all: 58=2)
0.15625 = fieldNorm(field=_all, doc=0)

So I recommend you to check your query again. If you still have
problem with it, then can you send query explanation of a query and
the query which illustrate the problem? I would recommend to test your
problem with curl using the rest interface, just to check if you make
any mistake from your java code. I also recommend to always check the
generated json to see what was generated.

Regards,
Tamas

On Aug 19, 2:38 pm, "kranti.vns" kranti....@gmail.com wrote:

Hi Tamas,

When I am querying using curl following data is fetched :

$ curl -XGEThttp://172.18.175.168:9200/_search?pretty=1-d '{"query" :
{"query_string" : {"query" : ""Aug 12 23:59:58""}}}'
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.7003107,
"hits" : [ {
"_index" : "20110812",
"_type" : "syslog",
"_id" : "1313209533-195322",
"_score" : 1.7003107, "_source" : { "date" : "2011/08/12 23:59:58",
"host" : "hallmark-snrt-008", "message" : "Aug 12 23:59:58
hallmark-snrt-008 snort[14387]: [1:402:8] ICMP Destination Unreachable Port
Unreachable [Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46
-> 172.17.1.26", "service" : "snort", "sip" : "10.27.32.46", "dip" :
"172.17.1.26", "sigid" : "1:402", "signame" : "ICMP Destination
Unreachable Port Unreachable", "app" : "Snort IDS", "ips-severity" : "3",
"ips-category" : "Misc activity" }
}, {
"_index" : "20110812",
"_type" : "syslog",
"_id" : "1313209533-195324",
"_score" : 1.7003107, "_source" : { "date" : "2011/08/12 23:59:58",
"host" : "hallmark-snrt-008", "message" : "Aug 12 23:59:58
hallmark-snrt-008 snort[14387]: [1:402:8] ICMP Destination Unreachable Port
Unreachable [Classification: Misc activity] [Priority: 3] {ICMP} 10.27.32.46
-> 172.17.1.26", "service" : "snort", "sip" : "10.27.32.46", "dip" :
"172.17.1.26", "sigid" : "1:402", "signame" : "ICMP Destination
Unreachable Port Unreachable", "app" : "Snort IDS", "ips-severity" : "3",
"ips-category" : "Misc activity" }
} ]
}

}

Here is what I am getting in explanation :

1.0035685 = sum of:
0.14217784 = weight(message:aug in 4660), product of:
0.14217927 = queryWeight(message:aug), product of:
0.99999 = idf(docFreq=100001, maxDocs=100001)
0.1421807 = queryNorm
0.99999 = fieldWeight(message:aug in 4660), product of:
1.0 = tf(termFreq(message:aug)=1)
0.99999 = idf(docFreq=100001, maxDocs=100001)
1.0 = fieldNorm(field=message, doc=4660)
0.01777223 = weight(_all:12 in 4660), product of:
0.14217927 = queryWeight(_all:12), product of:
0.99999 = idf(_all: 12=100001)
0.1421807 = queryNorm
0.12499875 = fieldWeight(_all:12 in 4660), product of:
1.0 = btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
0.99999 = idf(_all: 12=100001)
0.125 = fieldNorm(field=_all, doc=4660)
0.8436185 = sum of:
0.11617984 = weight(_all:23 in 4660), product of:
0.363522 = queryWeight(_all:23), product of:
2.5567605 = idf(_all: 23=21081)
0.1421807 = queryNorm
0.31959507 = fieldWeight(_all:23 in 4660), product of:
1.0 = btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
2.5567605 = idf(_all: 23=21081)
0.125 = fieldNorm(field=_all, doc=4660)
0.3486262 = weight(_all:59 in 4660), product of:
0.6297169 = queryWeight(_all:59), product of:
4.42899 = idf(_all: 59=3241)
0.1421807 = queryNorm
0.55362374 = fieldWeight(_all:59 in 4660), product of:
1.0 = btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
4.42899 = idf(_all: 59=3241)
0.125 = fieldNorm(field=_all, doc=4660)
0.3788125 = weight(_all:58 in 4660), product of:
0.65641344 = queryWeight(_all:58), product of:
4.616755 = idf(_all: 58=2686)
0.1421807 = queryNorm
0.5770944 = fieldWeight(_all:58 in 4660), product of:
1.0 = btq, product of:
1.0 = tf(phraseFreq=1.0)
1.0 = allPayload(...)
4.616755 = idf(_all: 58=2686)
0.125 = fieldNorm(field=_all, doc=4660)

Thanks
Kranti

--
View this message in context:http://elasticsearch-users.115913.n3.nabble.com/exact-phrase-search-q...
Sent from the Elasticsearch Users mailing list archive at Nabble.com.