Match exact substring in not analyzed field


(Maya) #1

Hi,

I have a multi_field mapping:
"testMulti": {
"type": "multi_field",
"fields": {
"testMulti": {
"type": "string",
"index": "analyzed",
"analyzer": "english"
},
"exact": {
"type": "string",
"index": "not_analyzed"
}
}
}

I would like to use testMulti.exact field for exact match and also for substring exact match.
If the field contains:
"There is a dog"
it will be returned for:
"query_string": {
"query": ""is a dog"",
"fields": [
"testMulti.exact"
]
}
}
and also for "query": ""There is"" and not for "query": ""Is a dog"", "query": ""are a dog"", etc...

The document is returned only for a full match: "query": ""There is a dog"", and not part match.

How can I achieve part match?

Thanks.


(Brian Yoder) #2

This is an interesting problem. Typically, my view of stop words is dim. I
would prefer that the client side avoids searching on them if that is
desired, rather than the engine ignores them. Then, phrase matching can
work properly. And queries such as The Wall can look for just Wall(ignoring
The as a stop word), but then the Google-like +The Wall can look for The
Wall. Yeah, I know that ES is not Google; I only look to Google for ideas
that are nice and for hints about their implementation based upon their
external behavior.

Then, your problem could be solved using a phrase query with no slop.

Maybe your testMulti field is analyzed but no stop words are ignored. Or,
maybe testMulti.raw is analyzed but with no stop words ignored. Either way,
you'd have the full set of words indexed for a phrase query to quickly find
the sub-match. At least, much, much more quickly than a grep-style wildcard
search against a non-analyzed form of the field.

I also used phrases within my own table-based synonym matching. Instead of
using ES synonyms, I create a separate type with lists of synonyms. A query
for a synonym is first directed to that type to fetch a list of synonyms;
then an OR query is generated. This has proven to be fast enough. It has
the benefit of allowing the synonyms to be updated with no changes to the
97-millon documents that are already indexed. And, synonyms can be phrases,
for example: HUGE -> "VERY BIG". So now a synonym query for HUGE can find The
Very Big Dog. Likewise, a synonym query for the phrase "VERY BIG" can find The
Huge Dog. Really cool; just a matter of Java coding on the front end. And
ES does the heavy lifting underneath. But I digress a little...

Hope this helps.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5440531a-2ccc-4df1-9edb-422012f7dd3b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Maya) #3

Thanks for the reply.
In the meantime I analyzed the exact part with whitespace analyzer, so it gives pretty good results.


(system) #4