Best way to add a new field in existing index

msk_76 · April 4, 2019, 4:33am

My scenario is that I have 5 fields in an existing index "emp_data" in elasticsearch. One of these field is employee_name. I have to add two new sixth & seventh fields "department", "company" to which this employee belongs. The dataset "emp_data" is too big in range of 20 million documents and keep on updating in hourly basis.
I have a small file containing "employee_name","company","department" information in csv format of few kb size.
I can take this file into a separate index say "corp_data" into elasticsearch.

My question is best way in elasticsearch to :

Join emp_data(big datas set) with corp_data( very small dataset)

My restrictions are : I can't do a join outside ES because emp_data is loaded by a application for which I don't have access to. I can do it only after emp_data is already populated.

Previously in a sql database I was using a ETL to join these two data sets and populate 7 fields in into a new table "emp_record" than emp_data and corp_data.

I am new to es.

spinscale · April 4, 2019, 9:49am

Hey,

you cannot join two indices together in elasticsearch, as Elasticsearch does not work as a SQL database. What you can do instead is to add those fields to your existing documents. One possible approach would be to use the Update API to update parts of a document

hope this helps!

--Alex

msk_76 · April 5, 2019, 4:48am

I think update is useful in case I know each & every document but in my case it's a big volume of data in one index and very small volume in other.

I tried to read something useful https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html

&
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/nested-objects.html

&
Term query lookup ( joining while querying data)
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html

But here I am not sure which is more suitable in ES from performance point of view because I will be handing data size in magnitude of 50 million of documents during query and sorting them.

Is it better

to store all fields as a single document and then do query ?

OR

Join two documents during query with term lookup mechanism?

Another point, is it possible to transform an index into another index within elasticsearch using logstash? I mean, can I query both index stored in elasticsearch ( to be joined) in logstash and join them in logstash and store in a new index that have fields from both index?

spinscale · April 5, 2019, 8:40am

Storing and searching within single documents will always be faster (at the expense of a more complex ingestion).

If I had the choice I would always try to go for the fastest solution on query time first.

system · May 3, 2019, 8:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding new field to existing index in ES 5.6 Elasticsearch	8	7064	February 9, 2018
Create new field in ES and backfill it with data Elasticsearch	4	2264	November 15, 2017
Add a new field in all the existing documents in ES Elasticsearch	3	275	September 21, 2022
Add a new field to all index in ElasticSearch Elasticsearch	4	420	August 27, 2019
What is the best practise to add a new field to an index in elasticsearch? Elasticsearch	5	205	November 6, 2023

Best way to add a new field in existing index

Related topics