Remove features


(Anand Nalya) #1

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #2

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then there
shouldn't be a .pos file. If you want to know more about the Lucene files I
recommend you to read:
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/package-summary.html

Martijn

On 13 September 2013 08:34, Anand Nalya anand.nalya@gmail.com wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #3

There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.

simon

On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then there
shouldn't be a .pos file. If you want to know more about the Lucene files I
recommend you to read:
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/package-summary.html

Martijn

On 13 September 2013 08:34, Anand Nalya <anand...@gmail.com <javascript:>>wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #4

Ah I forgot about _version field! Thanks Simon.

On 13 September 2013 12:42, simonw simon.willnauer@elasticsearch.comwrote:

There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.

simon

On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then
there shouldn't be a .pos file. If you want to know more about the Lucene
files I recommend you to read: http://lucene.apache.**
org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/
package-summary.htmlhttp://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/package-summary.html

Martijn

On 13 September 2013 08:34, Anand Nalya anand...@gmail.com wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5