I couldn't find a document that explains how the double quotes (exact string matches) in query strings work, so I did a bit of an experiment.
Both the analyzer and search analyzer of my document field was created with a chain of:
standard tokenzier -> lowercase -> stopwords -> shingles (min:2, max:3)
And index a document with:
" Typhoon Lekima, known in the Philippines as the Typhoon Hanna, was the second-costliest typhoon in Chinese history, only behind Fitow in 2013.[1] The ninth named storm of the 2019 Pacific typhoon season, Lekima originated from a tropical depression that formed east of the Philippines on July 30. It gradually organized, became a tropical storm and was named on August 4. Lekima intensified under favourable environmental conditions and peaked as a Category 4–equivalent super typhoon. However, an eyewall replacement cycle caused the typhoon to weaken before it made landfall in Zhejiang late on August 9, as a Category 2–equivalent typhoon. Lekima weakened subsequently while moving across the East China, and made its second landfall in Shandong on August 11."
Below are the query string and results:
-
" Lekima intensified under favourable environmental conditions and peaked"
Hit. -
"Lekima intensified on favourable environmental conditions or peaked"
Hit. (Change stopwords) -
" Lekima intensified under under favourable environmental conditions and peaked"
No hit. (Add stopwords) -
"intensified Lekima under favourable environmental conditions and peaked"
No hit. (Change sequence of token)
So it feels to me that the double quotes are simply skipping the tokenzier? It also does a simple text match and not relies on the tokens and frequencies - since both the document and query are fed through the rest of the filters, changing the stopwords returns the same result. The stopwords filter replaces them with underscore so adding more stopwords gives no hit. The shingles filter probably will be skipped too in this case, because there's no tokens fed into the chain?
But I wonder how was this indexed - or not indexed at all? When I tried it with a larger database (c.a. 30000 documents), it feels a bit too fast for the system to first process everything and search through with the query.
Thanks!