Best way to add a new field in existing index

My scenario is that I have 5 fields in an existing index "emp_data" in elasticsearch. One of these field is employee_name. I have to add two new sixth & seventh fields "department", "company" to which this employee belongs. The dataset "emp_data" is too big in range of 20 million documents and keep on updating in hourly basis.
I have a small file containing "employee_name","company","department" information in csv format of few kb size.
I can take this file into a separate index say "corp_data" into elasticsearch.

My question is best way in elasticsearch to :

  1. Join emp_data(big datas set) with corp_data( very small dataset)

My restrictions are : I can't do a join outside ES because emp_data is loaded by a application for which I don't have access to. I can do it only after emp_data is already populated.

Previously in a sql database I was using a ETL to join these two data sets and populate 7 fields in into a new table "emp_record" than emp_data and corp_data.

I am new to es.

Hey,

you cannot join two indices together in elasticsearch, as Elasticsearch does not work as a SQL database. What you can do instead is to add those fields to your existing documents. One possible approach would be to use the Update API to update parts of a document

hope this helps!

--Alex

I think update is useful in case I know each & every document but in my case it's a big volume of data in one index and very small volume in other.

I tried to read something useful https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html

&
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/nested-objects.html

&
Term query lookup ( joining while querying data)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html

But here I am not sure which is more suitable in ES from performance point of view because I will be handing data size in magnitude of 50 million of documents during query and sorting them.

Is it better

to store all fields as a single document and then do query ?

OR

Join two documents during query with term lookup mechanism?

Another point, is it possible to transform an index into another index within elasticsearch using logstash? I mean, can I query both index stored in elasticsearch ( to be joined) in logstash and join them in logstash and store in a new index that have fields from both index?

Storing and searching within single documents will always be faster (at the expense of a more complex ingestion).

If I had the choice I would always try to go for the fastest solution on query time first.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.