What would be the best solution for the following problem?
Assuming that I have 2 documents like:
doc1. "word1 électricité word3"
doc2. "word2 electricite word3"
I would like to provide general, and also specific search.
General search:
search for : "électricité -> [doc1, doc2]
search for: "electricite" -> [doc1, doc2]
(thats easy: I'm asciifolding and I'm also keeping the original
terms).
The obvious solution would be to add some duplicate fields in the
docs, but for large amounts of documents that will not work (>200 GB),
because the size of the indexes will get increased.
Does anybody else encountered this problems? What were the solutions
that you found?
What would be the best solution for the following problem?
Assuming that I have 2 documents like:
doc1. "word1 électricité word3"
doc2. "word2 electricite word3"
I would like to provide general, and also specific search.
General search:
search for : "électricité -> [doc1, doc2]
search for: "electricite" -> [doc1, doc2]
(thats easy: I'm asciifolding and I'm also keeping the original
terms).
The obvious solution would be to add some duplicate fields in the
docs, but for large amounts of documents that will not work (>200 GB),
because the size of the indexes will get increased.
Does anybody else encountered this problems? What were the solutions
that you found?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.