I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.
In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?
The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.
I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.
In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?
There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.
simon
On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:
The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.
If all your fields are configured with index_options : docs then there
shouldn't be a .pos file. If you want to know more about the Lucene files I
recommend you to read: org.apache.lucene.codecs.lucene41 (Lucene 4.4.0 API)
Martijn
On 13 September 2013 08:34, Anand Nalya <anand...@gmail.com <javascript:>>wrote:
Hi,
I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.
In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?
Thanks,
Anand
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.
simon
On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:
The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.
I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.
In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?
Thanks,
Anand
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.