Matching Token Subsets

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other words/tokens
contained in the other. So in this case, all of the tokens of "4S iPhone"
("4S" and "iPhone") are contained in "Apple iPhone 4S" ("Apple", "iPhone"
and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5" contain one token
each ("4S" and "5" respectively) not present in the other. Any suggestions
on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and "iPhone
4S by Apple" is a field in a document, a match query (using standardard
tokenizer) with default operator AND works. I'm finding it more difficult
when the document field itself is made up of fewer tokens than the query.

Thanks in advance!

--

Query expansion will work some extent. In your short documents also you can
make query part as optional and exact match. In above query expansion
"apple" is organization and iphone is product and its product sub-types as
version (here 4S and 5 ) will help to expand it.

On Fri, Jan 18, 2013 at 2:40 PM, Govind Chandrasekhar
govind201@gmail.comwrote:

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other
words/tokens contained in the other. So in this case, all of the tokens of
"4S iPhone" ("4S" and "iPhone") are contained in "Apple iPhone 4S"
("Apple", "iPhone" and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5"
contain one token each ("4S" and "5" respectively) not present in the
other. Any suggestions on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and "iPhone
4S by Apple" is a field in a document, a match query (using standardard
tokenizer) with default operator AND works. I'm finding it more
difficult when the document field itself is made up of fewer tokens than
the query.

Thanks in advance!

--

--
Mr. Dnyaneshwar Dewadkar
Dept. of Information Technology

--

Unfortunately, my data is unstructured. Labelling the different parts of
the query at scale is not an option for me.

On Friday, January 18, 2013 1:36:27 AM UTC-8, DD wrote:

Query expansion will work some extent. In your short documents also you
can make query part as optional and exact match. In above query expansion
"apple" is organization and iphone is product and its product sub-types as
version (here 4S and 5 ) will help to expand it.

On Fri, Jan 18, 2013 at 2:40 PM, Govind Chandrasekhar <govi...@gmail.com<javascript:>

wrote:

I'm trying to write a query that satisfies the following scenario:

Query: Apple iPhone 4S

Documents:

Apple iPhone 5 [SHOULD NOT MATCH]

4S iPhone [SHOULD MATCH]

Two documents should match if either contains all of the other
words/tokens contained in the other. So in this case, all of the tokens of
"4S iPhone" ("4S" and "iPhone") are contained in "Apple iPhone 4S"
("Apple", "iPhone" and "4S"). But "Apple iPhone 4S" and "Apple iPhone 5"
contain one token each ("4S" and "5" respectively) not present in the
other. Any suggestions on how I can frame my query to satisfy this case?

Note, for the reverse scenario, where the query is "4S iPhone" and
"iPhone 4S by Apple" is a field in a document, a match query (using
standardard tokenizer) with default operator AND works. I'm finding it
more difficult when the document field itself is made up of fewer tokens
than the query.

Thanks in advance!

--

--
Mr. Dnyaneshwar Dewadkar
Dept. of Information Technology

--