Is it possible to delete data/field without affecting the index?

Hi,

I have some terabytes of documents (pdf, office, etc) stored in some system
outside of ES. Suppose I want to make them searchable with ES, however I
will never serve the original documents from ES, but from that other system.
Is it possible to send the documents to ES (e.g. via base64 encoded field
and the attachment type mapping), have ES index them and afterwards delete
that base64 field so that the "real content" of my documents is not stored
in ES (for cost reasons)?
Queries will then be served by ES but the real document is served by that
other system I have.

Regards,
Dieter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d084274d-8d50-4fb7-8357-8d53f5177e1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You could may be use source exclude: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

Though I think it would be better to extract yourself content using Tika if you are using Java and only send what you need to ES.

David

Le 12 févr. 2015 à 22:39, warpkanal@gmail.com a écrit :

Hi,

I have some terabytes of documents (pdf, office, etc) stored in some system outside of ES. Suppose I want to make them searchable with ES, however I will never serve the original documents from ES, but from that other system.
Is it possible to send the documents to ES (e.g. via base64 encoded field and the attachment type mapping), have ES index them and afterwards delete that base64 field so that the "real content" of my documents is not stored in ES (for cost reasons)?
Queries will then be served by ES but the real document is served by that other system I have.

Regards,
Dieter

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d084274d-8d50-4fb7-8357-8d53f5177e1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/591A15AE-09DC-4D63-B541-483D2788CD15%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Thanks a lot, that sounds exactly like what I was looking for!
Why would you suggest extracting the content myself? Because of the
"experimental" state of the attachment type plugin?
Even if I'd extract the content myself I wouldn't want to store it in ES
(as I'd never request it from ES). The only benefit I could think of is the
ability to reindex inside ES without having my outer system to feed the
content in again for reindexing.

On Thursday, February 12, 2015 at 11:02:51 PM UTC+1, David Pilato wrote:

You could may be use source exclude:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

Though I think it would be better to extract yourself content using Tika
if you are using Java and only send what you need to ES.

David

Le 12 févr. 2015 à 22:39, warp...@gmail.com <javascript:> a écrit :

Hi,

I have some terabytes of documents (pdf, office, etc) stored in some
system outside of ES. Suppose I want to make them searchable with ES,
however I will never serve the original documents from ES, but from that
other system.
Is it possible to send the documents to ES (e.g. via base64 encoded field
and the attachment type mapping), have ES index them and afterwards delete
that base64 field so that the "real content" of my documents is not stored
in ES (for cost reasons)?
Queries will then be served by ES but the real document is served by that
other system I have.

Regards,
Dieter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d084274d-8d50-4fb7-8357-8d53f5177e1f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d084274d-8d50-4fb7-8357-8d53f5177e1f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b37643b0-c05e-4159-86b4-3f31f8fbfb9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.