Replace the indexed data with new data

Hi,

I have an Index named "sanjay_data" and I want to replace the already existing data in the index with the new data.

Instead of deleting the index and creating it again, is there any possibility?

Please help me in doing this.

Thanks & Regards,
Sanjay Reddy.

1 Like

You can update a single document atomically but for a multi-document index you should look into using index aliases as described in the Changing Mapping with Zero Downtime blog post. Unfortunately I don't think Logstash has any built-in support for updating indexes to support this use case.

@magnusbaeck

"sanjay_data" is a single document index. If I update, the new data will be added up to the old data. But, I want to replace all the old data with the new one.

Can we do this?

1 Like

If you set the elasticsearch output's document_id parameter to a fixed value, Logstash will update the existing document atomically updated instead of just piling on another document with an automatically chosen document id.

@magnusbaeck

When I tried giving the document_id, only the last record in the document is indexing. Remaining data is not coming up

I have provided the screenshots of kibana and Head plugin to show only one record is indexed.

This is the output of the config file that I used.
output
{
elasticsearch_http
{
host => "localhost"
index => "sanjay_data"
index_type => "sanjay_data"
document_id => "%{[@metadata][_id]}"
template => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/elasticsearch-template-sanjay_data.json"
template_name => "sanjay_data"
}
stdout
{
codec => "json"
debug => true
}

}

Should I change anything?
Please help.

1 Like

I thought that's what you wanted; update the existing document (singular).

document_id => "%{[@metadata][_id]}"

As your Kibana screenshot shows, there is no [@metadata][_id] field so the id of each document is the literal string [@metadata][_id], which obviously explains why there's only one document. What inputs do you have?

@magnusbaeck

I have an input file having 20 records which is aalready indexed. Now the 20 records in the file has changed. So, I want to replace all the records in that index.

Here is the config file that I'm using

input
{
file
{
path => "Q:/sanjay/sanjay-data.psv"
type => "all"
start_position => "beginning"
}
}
filter
{

csv 
{
	columns =>["IPID","AdmissionNumber","PatientID","RegCode","FirstName","Middlename","LastName","FirstName2l","Middlename2l","LastName2l","PatientName","PatientName2l","Age","AgeUoM","AgeUoM2l","FullAge","FullAge2l","Gender","Gender2L","BedID","BedName","BedName2l","BedTypeId","BedType","Room","WardID","Ward","Ward2l","Status","AdmitDate","AgeUoMID","ConsultantID","Consultant","Consultant2l","GenderId","CompanyID","CompanyName","CompanyName2l","PatientType","TariffID","BillBedTypeID","ParentIPID","DOB","EpisodeID","DischargeDate","DischargeReason","DischargeReason2l","IsVIP","NameNoTitle","NameNoTitle2l","IsNewBorn","IsRefDocExternal","RefDocID","RefDoctorName","RefDoctorName2l","ExRefDocID","ExRefDoctorName","ExRefDoctorName2l","City","City2l","PhoneNo","Address","Address2l","HospitalID","SpecialiseID","Specialisation","Specialisation2L","LetterID","BillType","EligibleBedType","CityID","ExpiredDate","ENDDATE","Remarks","NationalityID","Clearence","ClearenceRemarks","TransferID","BLOCKED","GradeId","EmpNo","VisitID","VisitDate","VisitType","PassportNo","SSN","MrNo","WorkPermitID","AdmSourceID","AdmSourceName","RoomId","Title","DischargeReasonID","DischargeRemarks","CALAGE","CALUOMID","RefDocCode","ExRefDocCode","ConsultantCode","RefDocNo","ConsultantNo"]
    separator => "|"
}
grok    
{
    patterns_dir => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/patterns"
   
     match => ["AdmitDate", "%{YEAR:al_year}-%{MONTHNUM:al_month}-%{MONTHDAY:al_monthday} %{TIME:al_time}" ]
    add_field => [ "LogTime", "%{al_year}-%{al_month}-%{al_monthday} %{al_time}" ]
}
date 
{
    match => [ "LogTime", "YYYY-MM-dd HH:mm:ss.SSS"]
}
mutate 
{
	convert => ["PatientID", "integer"]
	convert => ["Age", "integer"]
}

}

output
{
elasticsearch_http
{
host => "localhost"
index => "sanjay_data"
index_type => "sanjay_data"
document_id => "%{[@metadata][_id]}"
template => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/elasticsearch-template-hcg-sanjay-data.json"
template_name => "sanjay_data"
}
stdout
{
codec => "json"
debug => true
}

}

Okay. The value you assign to document_id should be a value that's unique to that log entry. You're currently using [@metadata][_id] which doesn't work since there's no such field. Perhaps the admission number would be more appropriate? Or that patient id? Whatever is the primary key of each entry will do.

Is it like document_id => "PatientID" or any other format?

document_id => "%{name-of-field}"

See the documentation.

1 Like

Thanks @magnusbaeck it worked :slight_smile: