I am using ingest attachment for indexing pdf, excel.. etc.. document. I am doing it with rest client. Is it possible i can update index.
I mean to say that when i do first POST request then it will index my first pdf document and in my second post request with same index i should able to do indexing of second pdf document. This means it will append 2nd pdf document with first one.
I mean in first post request i am creating pipeline first and then indexing Base64 document. After some time according to my use case i want to upload another pdf base64 document in the same index so that any time user can upload document but when they search it will display how many files contain these search/query keyword.
As of now i am doing POST request with different different index value and i am searching document with respective index value.
for creating same index for all POST request from rest client which mean all of my post request sending different different pdf document to be indexed . then this updation working sometime. sometime means its completely replacing my 2nd post request index and sometime completely replacing my 3rd post request index.
Is it right way to do ?
Could you suggest me right way to index document with same index for all POST request. I am using ingest attachment plugin. to parse my pdf documents.
Yes i have all binary document when i upload new one but it decrease huge performance.
My use-case is with ingest attachment Plugin because each time when user want he need to upload new (pdf/doc/etc...) file to the cloud and if i calculate binary of all file again whatever files have uploaded in the cloud then it will result in performance penalty.
Should i manually parse file using TIKA parser and use old API instead of using ingest attachment for this update index case.
In my cloud application i am having different different user. i thought of storing all documents from same user in the same index.
You are right,
i can use different id for same user for storing document.
Now in your suggested example i can store id in separate variable (only for tracking when i do search/query operation) and when user want to search document i can iterate all of the id with same index and when it find document with correspond id i can return that information.
My last comment part was question. I am creating single index per user and when user try to upload new pdf or any other file i will increment my id and store binary base64 document to my index. and next time same user want to store document i will increment id and so on....
While searching i will iterate in a loop form 1 to max id number for that user and search the document.
Am i right for above approach, am i missing anything ?
It may be silly questions i am new to Elasticsearch.
Just now i stored documents with different different id with same index.
When i performed search/query operation i did not give id number i just need give index and type and it gave me result in json response which contain id number along with other information like file name etc...
Basically, if you want to search for documents for a given user, just wrap your query inside a bool query using a must clause, then add a filter clause which filters the results with the user id.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.