Question about structure

yan014 · May 22, 2019, 10:31am

Hello everyone,

I'm developing a program that uses elastic search as a search engine.

I found it interesting to have the opinion of the community on how to structure my data.

The project is quite simple, we have documents (a document can be in several languages and have several versions) that are categorized and have dynamic meta data according to their category.

I have already done some test with "elasticsearch ingest attachment" to send the document and parse directly?

But I do not see how deal with meta data that is dynamic according to the category of the document.

Can you advise me on it?

Thank you in advance,

Have a good day,

dadoonet · May 22, 2019, 2:01pm

Are you providing the metadata by yourself?

yan014 · May 22, 2019, 2:44pm

Hello

Yes, fields encoded by me like dates, text, numbers, multiple values...

dadoonet · May 22, 2019, 3:16pm

So you can do something like:

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

And that should work.

yan014 · May 23, 2019, 8:20am

Thank you, I thought to do this

And for categories, I add them like this:

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category": {
      "id": 1,
      "name": "Category name "
  },
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

OR

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category": "Category name ",
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

Or like this and I have a way of making a relationship with another index containing the categories?

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category_id":  1,
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

Thanks a lot

dadoonet · May 23, 2019, 1:02pm

I'd not do that. I'd do:

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category": "Category name ",
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

yan014 · May 23, 2019, 1:22pm

Thanks

If my document and my categories are multilingual, I index the document for each language?

No need to have this :

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": {
    "en": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
    "fr": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 },
  "category": {
    "en": "Category name EN",
    "fr": "Category name FR"
 },
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

but

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category": "Category name EN ",
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

AND the file in french :

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "category": "Category name FR ",
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

dadoonet · May 23, 2019, 1:36pm

I prefer the later form but I'd use 2 indices: my_index_fr and my_index_en. Unless there is a need to have an absolute relationship between both versions of the same document.

yan014 · May 23, 2019, 1:49pm

Why the advantages to use 2 indices ?

And what do you mean by "an absolute relationship", how can i do that with ES ?

dadoonet · May 23, 2019, 2:07pm

I mean that if you want to have within the same elasticsearch document both versions, then you have to do something like:

PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": {
    "en": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
    "fr": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 },
  "category": {
    "en": "Category name EN",
    "fr": "Category name FR"
 },
  "meta": {
    "foo1": "bar1",
    "foo2": "bar2"
  }
}

yan014 · May 23, 2019, 2:13pm

Okey

No I do not need it " i think " , otherwise I also have a database mysql behind which can do it.

And what the advantages to use 2 indices ?

Thanks a lot for all the informations and your time

dadoonet · May 23, 2019, 2:40pm

And what the advantages to use 2 indices ?

Well. It reduces the number of fields within one index. Not a big deal here I guess as you have a few of them.
What I'd think about is "reindex" needs. If something goes wrong with a specific lang, ie FR and you need to change the text analyzer (which means reindex). Do you want to reindex both languages or only one?

I prefer one but it's really up to you.

Just giving my 2 cents here

yan014 · June 3, 2019, 10:35am

Thanks a lot for the informations

Dummy question about "PUT _ingest/pipeline/attachment", i need to call once or for each "PUT
my_index/_doc/my_id?pipeline=attachment" in different request ?

Thanks

dadoonet · June 3, 2019, 10:47am

Only once.

system · July 1, 2019, 10:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document-level metadata Elasticsearch	6	2121	January 30, 2018
Ingest question - attachment processor plugin and dynamic fields Elasticsearch	1	1273	August 6, 2017
Correct way of mapping some structure to Elasticsearch document Elasticsearch	1	139	January 11, 2024
How Attachments or file storage and searching is handled in Elasticsearch Elasticsearch	7	1439	August 13, 2020
Date model for elastic search document Elasticsearch	2	359	July 6, 2017

Question about structure

Related topics