Replace the indexed data with new data


(Sanjay Reddy) #1

Hi,

I have an Index named "sanjay_data" and I want to replace the already existing data in the index with the new data.

Instead of deleting the index and creating it again, is there any possibility?

Please help me in doing this.

Thanks & Regards,
Sanjay Reddy.


Delete datas before insert
(Magnus Bäck) #2

You can update a single document atomically but for a multi-document index you should look into using index aliases as described in the Changing Mapping with Zero Downtime blog post. Unfortunately I don't think Logstash has any built-in support for updating indexes to support this use case.


(Sanjay Reddy) #3

@magnusbaeck

"sanjay_data" is a single document index. If I update, the new data will be added up to the old data. But, I want to replace all the old data with the new one.

Can we do this?


(Magnus Bäck) #4

If you set the elasticsearch output's document_id parameter to a fixed value, Logstash will update the existing document atomically updated instead of just piling on another document with an automatically chosen document id.


(Sanjay Reddy) #5

@magnusbaeck

When I tried giving the document_id, only the last record in the document is indexing. Remaining data is not coming up

I have provided the screenshots of kibana and Head plugin to show only one record is indexed.

This is the output of the config file that I used.
output
{
elasticsearch_http
{
host => "localhost"
index => "sanjay_data"
index_type => "sanjay_data"
document_id => "%{[@metadata][_id]}"
template => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/elasticsearch-template-sanjay_data.json"
template_name => "sanjay_data"
}
stdout
{
codec => "json"
debug => true
}

}

Should I change anything?
Please help.


(Magnus Bäck) #6

I thought that's what you wanted; update the existing document (singular).

document_id => "%{[@metadata][_id]}"

As your Kibana screenshot shows, there is no [@metadata][_id] field so the id of each document is the literal string [@metadata][_id], which obviously explains why there's only one document. What inputs do you have?


(Sanjay Reddy) #7

@magnusbaeck

I have an input file having 20 records which is aalready indexed. Now the 20 records in the file has changed. So, I want to replace all the records in that index.

Here is the config file that I'm using

input
{
file
{
path => "Q:/sanjay/sanjay-data.psv"
type => "all"
start_position => "beginning"
}
}
filter
{

csv 
{
	columns =>["IPID","AdmissionNumber","PatientID","RegCode","FirstName","Middlename","LastName","FirstName2l","Middlename2l","LastName2l","PatientName","PatientName2l","Age","AgeUoM","AgeUoM2l","FullAge","FullAge2l","Gender","Gender2L","BedID","BedName","BedName2l","BedTypeId","BedType","Room","WardID","Ward","Ward2l","Status","AdmitDate","AgeUoMID","ConsultantID","Consultant","Consultant2l","GenderId","CompanyID","CompanyName","CompanyName2l","PatientType","TariffID","BillBedTypeID","ParentIPID","DOB","EpisodeID","DischargeDate","DischargeReason","DischargeReason2l","IsVIP","NameNoTitle","NameNoTitle2l","IsNewBorn","IsRefDocExternal","RefDocID","RefDoctorName","RefDoctorName2l","ExRefDocID","ExRefDoctorName","ExRefDoctorName2l","City","City2l","PhoneNo","Address","Address2l","HospitalID","SpecialiseID","Specialisation","Specialisation2L","LetterID","BillType","EligibleBedType","CityID","ExpiredDate","ENDDATE","Remarks","NationalityID","Clearence","ClearenceRemarks","TransferID","BLOCKED","GradeId","EmpNo","VisitID","VisitDate","VisitType","PassportNo","SSN","MrNo","WorkPermitID","AdmSourceID","AdmSourceName","RoomId","Title","DischargeReasonID","DischargeRemarks","CALAGE","CALUOMID","RefDocCode","ExRefDocCode","ConsultantCode","RefDocNo","ConsultantNo"]
    separator => "|"
}
grok    
{
    patterns_dir => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/patterns"
   
     match => ["AdmitDate", "%{YEAR:al_year}-%{MONTHNUM:al_month}-%{MONTHDAY:al_monthday} %{TIME:al_time}" ]
    add_field => [ "LogTime", "%{al_year}-%{al_month}-%{al_monthday} %{al_time}" ]
}
date 
{
    match => [ "LogTime", "YYYY-MM-dd HH:mm:ss.SSS"]
}
mutate 
{
	convert => ["PatientID", "integer"]
	convert => ["Age", "integer"]
}

}

output
{
elasticsearch_http
{
host => "localhost"
index => "sanjay_data"
index_type => "sanjay_data"
document_id => "%{[@metadata][_id]}"
template => "Q:/softwares/ElasticSearch/logstash-1.3.3-flatjar/elasticsearch-template-hcg-sanjay-data.json"
template_name => "sanjay_data"
}
stdout
{
codec => "json"
debug => true
}

}


(Magnus Bäck) #8

Okay. The value you assign to document_id should be a value that's unique to that log entry. You're currently using [@metadata][_id] which doesn't work since there's no such field. Perhaps the admission number would be more appropriate? Or that patient id? Whatever is the primary key of each entry will do.


(Sanjay Reddy) #9

Is it like document_id => "PatientID" or any other format?


(Magnus Bäck) #10
document_id => "%{name-of-field}"

See the documentation.


(Sanjay Reddy) #11

Thanks @magnusbaeck it worked :slight_smile:


(system) #12