Elastic Search Splitting in Azure Container Apps

I am running the elasticsearch image in an azure container app. I have pushed data into ES using logstash via jdbc from a mssql db. In azure, I can see that there are 2 replicas of the image. As I am querying the data, I am getting about 1/2 of the expected results.

Similarly, if I run this curl:

curl -X GET "https://container-app-url/<index_name>/_count"

I get this: {"count":204000,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

and if I run it again, I get this: {"count":203940,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

These two count numbers add up tot the total amount of documents I would expect based on the number of rows in the db (407940)

I do not understand why the index is being split like this and why all of the results are not included in the search, but am guessing it has to do with the azure replicas.

I don't think this is relevant but here is the pipeline.conf just in case:

 input { 
  jdbc { 
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver" 
    jdbc_connection_string => ****
    jdbc_user => ****
    jdbc_password => ****
    jdbc_paging_enabled => true        
    jdbc_page_size => 100000
    jdbc_fetch_size => 100000
    statement => "SELECT
                    REP_Info_indvlPK,
                    REP_Info_firstNm,
                    REP_Info_lastNm,
                    FullName,
                    S65,
                    S63,
                    S66,
                    Chartered_Financial_Consultant,
                    Chartered_Investment_Counselor,
                    Chartered_Financial_Analyst,
                    Personal_Financial_Specialist,
                    Certified_Financial_Planner,
                    Current_Employer,
                    Current_Employer_State,
                    Current_Employer_City,
                    Branch_City,
                    Branch_State,
                    REP_DRP_hasRegAction,
                    REP_DRP_hasCriminal,
                    REP_DRP_hasBankrupt,
                    REP_DRP_hasCivilJudc,
                    REP_DRP_hasBond,
                    REP_DRP_hasJudgment,
                    REP_DRP_hasInvstgn,
                    REP_DRP_hasCustComp,
                    REP_DRP_hasTermination
                FROM SearchIndividuals"
    lowercase_column_names => true 
    jdbc_validate_connection => true 
    record_last_run => false  
    clean_run => false
    tracking_column => "rep_info_indvlpk"
    tracking_column_type => "numeric"
    use_column_value => true
  } 

} 
output {
  elasticsearch {
    hosts => "https://container-app-url
    index => "inx_individuals"
    action => "index"
    document_id => "%{rep_info_indvlpk}"
  }
}

Thanks for you help!

Quick update: I limited my azure container app to one replica and now all the records are there and it searches perfectly.

Will limiting the replicas to one have an impact on scaling queries?

TBH I don't understand your architecture.

I can see that there are 2 replicas of the image
...
I limited my azure container app to one replica

What does this mean?

How Elasticsearch is deployed?

It is deployed as a docker image in Azure. A replica is a duplicated instance of the image which is created to scale functionality.

I'm not sure but I'm feeling that you are creating that way 2 clusters with one node each instead of 1 cluster with 2 nodes.

I guess this needs to be setup correctly instead.