Determine diacritics sensitivity at query time

Sisu_Alexandru · November 24, 2011, 4:09pm

Hi,

What would be the best solution for the following problem?

Assuming that I have 2 documents like:
doc1. "word1 électricité word3"
doc2. "word2 electricite word3"

I would like to provide general, and also specific search.
General search:

search for : "électricité -> [doc1, doc2]
search for: "electricite" -> [doc1, doc2]
(thats easy: I'm asciifolding and I'm also keeping the original
terms).

Diacritic sensitiv search:
search for: "électricité" -> [doc1]
search for:"electricite" -> [doc2]

How can I offer both?

The obvious solution would be to add some duplicate fields in the
docs, but for large amounts of documents that will not work (>200 GB),
because the size of the indexes will get increased.

Does anybody else encountered this problems? What were the solutions
that you found?

Tnx,

Alex

kimchy · November 24, 2011, 6:32pm

If its on a specific field, you can use multi field mapping, one with ascii
folding, and one without, and pick and choose which one to query.

On Thu, Nov 24, 2011 at 6:09 PM, alex sisu.eugen@gmail.com wrote:

Hi,

What would be the best solution for the following problem?

Assuming that I have 2 documents like:
doc1. "word1 électricité word3"
doc2. "word2 electricite word3"

I would like to provide general, and also specific search.
General search:

search for : "électricité -> [doc1, doc2]
search for: "electricite" -> [doc1, doc2]
(thats easy: I'm asciifolding and I'm also keeping the original
terms).

Diacritic sensitiv search:
search for: "électricité" -> [doc1]
search for:"electricite" -> [doc2]

How can I offer both?

The obvious solution would be to add some duplicate fields in the
docs, but for large amounts of documents that will not work (>200 GB),
because the size of the indexes will get increased.

Does anybody else encountered this problems? What were the solutions
that you found?

Tnx,

Alex