Ingest attachment plugin (Index files multiple time to same _id)

I had a document stored in Elastic Search under unique id which has an array of pdf files indexed in it along with other information(Name, Phone No, etc...). Now i want to update same document(_id) by adding new pdf file while keeping the old files too. how i can update the document?

I believe you need to reindex the whole document with old PDFs + the new one.

hi,
is there any alternative for this because i want to keep the old files and add new ones.
Re-indixing will took to much time. consider a document already had 50 files and to add one new file i need to index 51 files.
is there any possible solution for this issue?
thanks

or any other way to store all files under same document(_id)?

The solution IMO (and this is what I did in the past) is to denormalize the content.

Instead of indexing something like:

PUT attachments/_doc/bar
{
  "foo": "bar",
  "attachments": [
    { "id": "bar1", "content": "BASE64"},
    { "id": "bar2", "content": "BASE64"},
    { "id": "bar3", "content": "BASE64"}
  ]
}

I was indexing:

PUT attachment/_doc/bar1
{
  "foo": "bar",
  "content": "BASE64"
}
PUT attachment/_doc/bar2
{
  "foo": "bar",
  "content": "BASE64"
}
PUT attachment/_doc/bar3
{
  "foo": "bar",
  "content": "BASE64"
}

thanks alot.

so it means there is no way to store all file under same document id over time?

because my main concern here is to keep the document id same.

Multiple documents can not share the same id.
You can look at parent child feature if really needed.

thanks

hi,

I made some changes to my docker file i.e. i change only base image and add few lines to install ingest attachment plugin.
I think i lost my data. is it supposed to be? because i did not change volume mounts.
How can i recover lost data and quick way to re index it?

Thanks

No. Probably a bad usage of docker IMHO.

How can i recover lost data and quick way to re index it?

I don't know...

hi David,
I have to index thousand of files (pdf, docx, etc) and i am worried about not to index the same file 2 or more times. is there any way to distinguish between files content before indexing?
thanks

The only way to do that IMO is to compute a fingerprint before indexing the document based on whatever fields you have and index that fingerprint as a field or as the _id.

If you are using it as the _id you can use the _create API which will reject any document that has been already indexed.

hi,
In image you can see two document are indexed.
i have to make a join query to match attachment.content filed from one doc with user defined input but having a where condition matches from other document like a join(Employer_id filed from other document)
i have join relationship.
Any suggestions how to make a join query?
thanks in advance

my use case and my proposed design is basically like :
-index every document(resume, cover letter etc) come to our system for each user(candidate). i indexed each file separately with some user information (this is my Parent document)

-when user apply for a job i need to create another doc to keep track of that this document is now refereed to this job or employer. I indexed new document which has information related to employer.this is my child doc.

-i can get all parents from child(by searching inside child) or all child from parent(search inside parent).

-is there any way i can search in both parent and child in one search query at same time and it returns the results if join works? in my case i have to match one filed from parent doc and other from child doc and then get results if both matches?

Please don't post images of text as they are hardly readable and not searchable.

Instead paste the text and format it with </> icon. Check the preview window.

Any suggestions how to make a join query?

Elasticsearch does not support joins. Unless you are using parent/child but you need to understand what are the pro/cons of using that.

Instead, I'd index within each document all the information about the user. Basically I'd do joins at index time and index one single document.

Not sure if it fits your use case though but the important thing to recall is that Elasticsearch is not a relational system.

My 0.02 cents.

hi i am getting an error while adding filter=>lowercase on analyzer type=>keyword ? is it possible to have it?
Actually i want to apply lowercase filter on a particular field?
thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.