I'm sorry if this is documented somewhere but I cannot find it. Although I do think I remember reading it.
I want to index some email messages for search but they are already in a database so I do not want ES to save a copy of all the data (for example the body). Is there any way I can index an ENTIRE doc that looks like:
Wow. The all or nothing approach doesn't work for me. I need to be able to at least get back the document ID of the thing I am indexing. What I really need is the ability to throw away only certain fields in the _source.
Am I being stupid? If I bulk index 100k email messages and don't include the _source then how can I later fetch these emails after doing a search? Do I have to store the IDs generated by the indexing operation as they map to my original IDs? I don't love that idea and I'm not even sure how to do that in a bulk indexing operation.
Wow. The all or nothing approach doesn't work for me. I need to be able to
at least get back the document ID of the thing I am indexing. What I really
need is the ability to throw away only certain fields in the _source.
Am I being stupid? If I bulk index 100k email messages and don't include
the _source then how can I later fetch these emails after doing a search?
Do I have to store the IDs generated by the indexing operation as they map
to my original IDs? I don't love that idea and I'm not even sure how to do
that in a bulk indexing operation.
I don't follow what it is you are trying to do. Whether you index or
bulk_index you get back the ID (either the ID that you specify, or an
autogenerated ID)
Why don't you want the _source? Because it contains too much
information? What about deleting the information that you don't want to
store before indexing the email?
Wow. The all or nothing approach doesn't work for me. I need to be able
to
at least get back the document ID of the thing I am indexing. What I
really
need is the ability to throw away only certain fields in the _source.
Am I being stupid? If I bulk index 100k email messages and don't include
the _source then how can I later fetch these emails after doing a search?
Do I have to store the IDs generated by the indexing operation as they map
to my original IDs? I don't love that idea and I'm not even sure how to
do
that in a bulk indexing operation.
I don't follow what it is you are trying to do. Whether you index or
bulk_index you get back the ID (either the ID that you specify, or an
autogenerated ID)
Why don't you want the _source? Because it contains too much
information? What about deleting the information that you don't want to
store before indexing the email?
Wow. The all or nothing approach doesn't work for me. I need to be able
to
at least get back the document ID of the thing I am indexing. What I
really
need is the ability to throw away only certain fields in the _source.
Am I being stupid? If I bulk index 100k email messages and don't include
the _source then how can I later fetch these emails after doing a search?
Do I have to store the IDs generated by the indexing operation as they map
to my original IDs? I don't love that idea and I'm not even sure how to
do
that in a bulk indexing operation.
I don't follow what it is you are trying to do. Whether you index or
bulk_index you get back the ID (either the ID that you specify, or an
autogenerated ID)
Why don't you want the _source? Because it contains too much
information? What about deleting the information that you don't want to
store before indexing the email?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.