Duplicating data when i am using http plollar input

I am using http _pollar input and output as elastic search to get data.
From input i am getting data as xml format so i used xml filter in filter section.Everything coming properly into elastic search but when logstash runs second time i am getting duplicate records.

I used document_id but its not worked for me i think when i am using this data coming as xml data.
can you please suggest what mistake i am doing,below is my conf file.

> input {
>   http_poller {
>     urls => {
>       soap_request => {
>         method => post
>         url => "https://demos.com:443/acndataService/dataservice"
>         headers => {
>           "Content-Type" => "text/xml; charset=utf-8"
> 		  "SOAPAction" => "http://xmlns.xyz.com/apps/scm/customerNeedsManagement/datas/dataservice/getdata"
>            "Authorization" => "Basic xxxxxxxxxxxxxxxxxxxx"
> 		   "Host" => "demos.com:443"
> 		   "Accept-Encoding" => "gzip,deflate"
>         }
> body => '<?xml version="1.0" encoding="UTF-8"?>
> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:typ="http://xmlns.xyz.com/apps/customerNeedsManagement/datas/dataservice/types/" xmlns:typ1="http://xmlns.xyz.com/adf/svc/types/">
>    <soapenv:Header/>
>    <soapenv:Body>
>    <typ:finddataFinddataByName>
>    <typ:findCriteria>
>    <typ1:fetchStart>0</typ1:fetchStart>
>    <typ1:fetchSize>-1</typ1:fetchSize>
>    <typ1:filter>
> 	<typ1:conjunction>And</typ1:conjunction>
> 	<typ1:group>
> 	<typ1:conjunction>And</typ1:conjunction>
> 	<typ1:upperCaseCompare>false</typ1:upperCaseCompare>
> 	<typ1:item>
> 	<typ1:conjunction>And</typ1:conjunction>
> 	<typ1:upperCaseCompare>false</typ1:upperCaseCompare>
> 	<typ1:attribute>Name</typ1:attribute>
> 	<typ1:operator>STARTSWITH</typ1:operator>
> 	<typ1:finddataFinddataByName>%</typ1:finddataFinddataByName>
> 	</typ1:item>
> 	</typ1:group>
> 	</typ1:filter>
> 	<typ1:sortOrder>
> 	<typ1:sortAttribute>
> 	<typ1:name>CreationDate</typ1:name>
> 	<typ1:descending>true</typ1:descending>
> 	</typ1:sortAttribute>
> 	</typ1:sortOrder>
> 	<typ1:findAttribute>Name</typ1:findAttribute>
> 	<typ1:findAttribute>DataType</typ1:findAttribute>
> 	<typ1:findAttribute>Status</typ1:findAttribute>
> 	<typ1:findAttribute>CreatedBy</typ1:findAttribute>
> 	<typ1:findAttribute>CreationDate</typ1:findAttribute>
> 	<typ1:findAttribute>DataID</typ1:findAttribute>

>     </soapenv:Body>
> 	</soapenv:Envelope>'
>         }
>     }
>     request_timeout => 60
>     interval => 60
>     codec => "plain"
>   }
> } 
> filter {

> 			xml {
>            source => "message"
>            target => "xmldata"
>            store_xml => "false"
>            remove_namespaces => true
>            xpath => ["//Value","value"]
>            remove_field => "message"
>    }
>    split {
>      field => "value"
>    }
>    xml {
>      source => "value"
>      target => "data"
>      force_array => false
>      remove_field => "value"
>    }
> 	} 
> output {  
> stdout { codec => rubydebug }  
> 		elasticsearch  
> 		{  
> 			action => "index"   
> 			hosts => "127.0.0.1:9200" 
> 			index => "dataservice" 
> 			workers => 1
> 		}
> }

Thanks in advance

Seems to me that if you call the SOAP endpoint multiple times you will get the same data each time, i.e. duplicates.

What, in the SOAP body, do you think is directing the endpoint to respond only with objects Logstash has never seen before?

The idea with the document_id option is that a particular document is given a consistent ID in Elasticsearch so that the next time the same data is fetched the old document will be updated. How would a consistent ID be constructed for these documents?

Below dataid i am using as documnet_id

But I don't see any such value in the SOAP response you posted.

typ1:findAttributeName</typ1:findAttribute>

typ1:findAttributeDataType</typ1:findAttribute>
typ1:findAttributeStatus</typ1:findAttribute>
typ1:findAttributeCreatedBy</typ1:findAttribute>
typ1:findAttributeCreationDate</typ1:findAttribute>
typ1:findAttributeDataID</typ1:findAttribute>

taking doument_id as DataID

Oh, the XML you posted was the response schema, not an example of a response. Please show:

  • An example XML document (not the schema).
  • The equivalent JSON that you want to store in ES.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.