I am running the elasticsearch image in an azure container app. I have pushed data into ES using logstash via jdbc from a mssql db. In azure, I can see that there are 2 replicas of the image. As I am querying the data, I am getting about 1/2 of the expected results.
Similarly, if I run this curl:
curl -X GET "https://container-app-url/<index_name>/_count"
I get this: {"count":204000,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
and if I run it again, I get this: {"count":203940,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}
These two count numbers add up tot the total amount of documents I would expect based on the number of rows in the db (407940)
I do not understand why the index is being split like this and why all of the results are not included in the search, but am guessing it has to do with the azure replicas.
I don't think this is relevant but here is the pipeline.conf just in case:
input {
jdbc {
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => ****
jdbc_user => ****
jdbc_password => ****
jdbc_paging_enabled => true
jdbc_page_size => 100000
jdbc_fetch_size => 100000
statement => "SELECT
REP_Info_indvlPK,
REP_Info_firstNm,
REP_Info_lastNm,
FullName,
S65,
S63,
S66,
Chartered_Financial_Consultant,
Chartered_Investment_Counselor,
Chartered_Financial_Analyst,
Personal_Financial_Specialist,
Certified_Financial_Planner,
Current_Employer,
Current_Employer_State,
Current_Employer_City,
Branch_City,
Branch_State,
REP_DRP_hasRegAction,
REP_DRP_hasCriminal,
REP_DRP_hasBankrupt,
REP_DRP_hasCivilJudc,
REP_DRP_hasBond,
REP_DRP_hasJudgment,
REP_DRP_hasInvstgn,
REP_DRP_hasCustComp,
REP_DRP_hasTermination
FROM SearchIndividuals"
lowercase_column_names => true
jdbc_validate_connection => true
record_last_run => false
clean_run => false
tracking_column => "rep_info_indvlpk"
tracking_column_type => "numeric"
use_column_value => true
}
}
output {
elasticsearch {
hosts => "https://container-app-url
index => "inx_individuals"
action => "index"
document_id => "%{rep_info_indvlpk}"
}
}
Thanks for you help!