Matching Token Subsets

govind201 · January 18, 2013, 9:10am

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other words/tokens
contained in the other. So in this case, all of the tokens of "4S iPhone"
("4S" and "iPhone") are contained in "Apple iPhone 4S" ("Apple", "iPhone"
and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5" contain one token
each ("4S" and "5" respectively) not present in the other. Any suggestions
on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and "iPhone
4S by Apple" is a field in a document, a match query (using standardard
tokenizer) with default operator AND works. I'm finding it more difficult
when the document field itself is made up of fewer tokens than the query.

Thanks in advance!

--

dewadkar · January 18, 2013, 9:36am

Query expansion will work some extent. In your short documents also you can
make query part as optional and exact match. In above query expansion
"apple" is organization and iphone is product and its product sub-types as
version (here 4S and 5 ) will help to expand it.

On Fri, Jan 18, 2013 at 2:40 PM, Govind Chandrasekhar
govind201@gmail.comwrote:

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other
words/tokens contained in the other. So in this case, all of the tokens of
"4S iPhone" ("4S" and "iPhone") are contained in "Apple iPhone 4S"
("Apple", "iPhone" and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5"
contain one token each ("4S" and "5" respectively) not present in the
other. Any suggestions on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and "iPhone
4S by Apple" is a field in a document, a match query (using standardard
tokenizer) with default operator AND works. I'm finding it more
difficult when the document field itself is made up of fewer tokens than
the query.

Thanks in advance!

--

--
Mr. Dnyaneshwar Dewadkar
Dept. of Information Technology

--

govind201 · January 18, 2013, 6:32pm

Unfortunately, my data is unstructured. Labelling the different parts of
the query at scale is not an option for me.

On Friday, January 18, 2013 1:36:27 AM UTC-8, DD wrote:

Query expansion will work some extent. In your short documents also you
can make query part as optional and exact match. In above query expansion
"apple" is organization and iphone is product and its product sub-types as
version (here 4S and 5 ) will help to expand it.

On Fri, Jan 18, 2013 at 2:40 PM, Govind Chandrasekhar <govi...@gmail.com<javascript:>

wrote:

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other
words/tokens contained in the other. So in this case, all of the tokens of
"4S iPhone" ("4S" and "iPhone") are contained in "Apple iPhone 4S"
("Apple", "iPhone" and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5"
contain one token each ("4S" and "5" respectively) not present in the
other. Any suggestions on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and
"iPhone 4S by Apple" is a field in a document, a match query (using
standardard tokenizer) with default operator AND works. I'm finding it
more difficult when the document field itself is made up of fewer tokens
than the query.

Thanks in advance!

--

--
Mr. Dnyaneshwar Dewadkar
Dept. of Information Technology

--

Topic		Replies	Views
How to know which documents in the search results include all of the tokens in the search query (for one particular field) Elasticsearch	2	441	June 23, 2020
Multi token exact matching Elasticsearch	1	508	March 10, 2020
Match all fields tokens Elasticsearch	0	378	November 24, 2020
Searching with multiple tokens in the query Elasticsearch	1	992	February 12, 2013
Matching every documents tokens Elasticsearch	0	616	August 11, 2015

Matching Token Subsets

Related topics