Remove features

Anand_Nalya · September 13, 2013, 6:34am

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mvg · September 13, 2013, 9:46am

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then there
shouldn't be a .pos file. If you want to know more about the Lucene files I
recommend you to read:
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/package-summary.html

Martijn

On 13 September 2013 08:34, Anand Nalya anand.nalya@gmail.com wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

simonw_2 · September 13, 2013, 10:42am

There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.

simon

On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then there
shouldn't be a .pos file. If you want to know more about the Lucene files I
recommend you to read:
org.apache.lucene.codecs.lucene41 (Lucene 4.4.0 API)

Martijn

On 13 September 2013 08:34, Anand Nalya <anand...@gmail.com <javascript:>>wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

mvg · September 13, 2013, 10:50am

Ah I forgot about _version field! Thanks Simon.

On 13 September 2013 12:42, simonw simon.willnauer@elasticsearch.comwrote:

There is a .pos file hold data from the ID / Version field since we use
payloads for versioning. This will go away in 1.0 since we use DocValues
for this in 1.0.

simon

On Friday, September 13, 2013 11:46:03 AM UTC+2, Martijn v Groningen wrote:

The .doc files contain the posting list. The postings list and the term
dictionary (.tim files) are used for basic filtering / querying, without
those files simple simple filtering / querying wouldn't work.

If all your fields are configured with index_options : docs then
there shouldn't be a .pos file. If you want to know more about the Lucene
files I recommend you to read: http://lucene.apache.**
org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/
package-summary.htmlhttp://lucene.apache.org/core/4_4_0/core/org/apache/lucene/codecs/lucene41/package-summary.html

Martijn

On 13 September 2013 08:34, Anand Nalya anand...@gmail.com wrote:

Hi,

I want to use elasticsearch for searches that tells whether a particular
tokens occurs within a document or not. I don't want any sort of scoring,
frequency or positioning info. For this I've set omit_norms=true and
index_options=docs.

In the data directory, I can see pos and doc file. Doc file accounts for
17% of the storage and Pos file takes 1.5% storage. Is there any way, we
can skip this files?

Thanks,
Anand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
Met vriendelijke groet,

Martijn van Groningen

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Newbie quesiton re: document size & score Elasticsearch	3	334	July 6, 2017
Removing docs from search results Elasticsearch	3	320	July 6, 2017
Is enabling docValues and disabling index possible Elasticsearch	2	379	July 6, 2017
Question about the ranking in ElasticSearch Elasticsearch	4	315	July 6, 2017
Provision to retrieve matched tokens ( From lucene ) and position directly Elasticsearch	2	319	July 6, 2017

Remove features

Related topics