Elastic Search Splitting in Azure Container Apps

mcmichaelau · July 31, 2024, 12:40am

I am running the elasticsearch image in an azure container app. I have pushed data into ES using logstash via jdbc from a mssql db. In azure, I can see that there are 2 replicas of the image. As I am querying the data, I am getting about 1/2 of the expected results.

Similarly, if I run this curl:

curl -X GET "https://container-app-url/<index_name>/_count"

I get this: {"count":204000,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

and if I run it again, I get this: {"count":203940,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

These two count numbers add up tot the total amount of documents I would expect based on the number of rows in the db (407940)

I do not understand why the index is being split like this and why all of the results are not included in the search, but am guessing it has to do with the azure replicas.

I don't think this is relevant but here is the pipeline.conf just in case:

 input { 
  jdbc { 
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver" 
    jdbc_connection_string => ****
    jdbc_user => ****
    jdbc_password => ****
    jdbc_paging_enabled => true        
    jdbc_page_size => 100000
    jdbc_fetch_size => 100000
    statement => "SELECT
                    REP_Info_indvlPK,
                    REP_Info_firstNm,
                    REP_Info_lastNm,
                    FullName,
                    S65,
                    S63,
                    S66,
                    Chartered_Financial_Consultant,
                    Chartered_Investment_Counselor,
                    Chartered_Financial_Analyst,
                    Personal_Financial_Specialist,
                    Certified_Financial_Planner,
                    Current_Employer,
                    Current_Employer_State,
                    Current_Employer_City,
                    Branch_City,
                    Branch_State,
                    REP_DRP_hasRegAction,
                    REP_DRP_hasCriminal,
                    REP_DRP_hasBankrupt,
                    REP_DRP_hasCivilJudc,
                    REP_DRP_hasBond,
                    REP_DRP_hasJudgment,
                    REP_DRP_hasInvstgn,
                    REP_DRP_hasCustComp,
                    REP_DRP_hasTermination
                FROM SearchIndividuals"
    lowercase_column_names => true 
    jdbc_validate_connection => true 
    record_last_run => false  
    clean_run => false
    tracking_column => "rep_info_indvlpk"
    tracking_column_type => "numeric"
    use_column_value => true
  } 

} 
output {
  elasticsearch {
    hosts => "https://container-app-url
    index => "inx_individuals"
    action => "index"
    document_id => "%{rep_info_indvlpk}"
  }
}

Thanks for you help!

mcmichaelau · July 31, 2024, 12:49am

Quick update: I limited my azure container app to one replica and now all the records are there and it searches perfectly.

Will limiting the replicas to one have an impact on scaling queries?

dadoonet · July 31, 2024, 6:54am

TBH I don't understand your architecture.

I can see that there are 2 replicas of the image
...
I limited my azure container app to one replica

What does this mean?

How Elasticsearch is deployed?

mcmichaelau · July 31, 2024, 9:06pm

It is deployed as a docker image in Azure. A replica is a duplicated instance of the image which is created to scale functionality.

dadoonet · July 31, 2024, 9:39pm

I'm not sure but I'm feeling that you are creating that way 2 clusters with one node each instead of 1 cluster with 2 nodes.

I guess this needs to be setup correctly instead.

Topic		Replies	Views
Searching data Elasticsearch	1	349	July 6, 2017
Problems with ElasticSearch together with CouchDB Elasticsearch	2	320	July 6, 2017
Index data from MSSQL server to Elasticsearch using Logstash as nested documents Logstash	8	1228	March 3, 2020
Elasticsearch-jdbc plugin fetching only part of data when nesting documents Elasticsearch	4	2016	July 6, 2017
Problem with elasticsearch-rive jdbc Elasticsearch	2	490	July 6, 2017

Elastic Search Splitting in Azure Container Apps

Related topics