Elastic search configuration for windows server

If I have a data of 20 Million and I want to push into elastic search single instance.

  • what will be the configuration of the server, I will be using windows server.
  • Does single instance is enough.
  • Creating Elasticsearch cluster is mandatory.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Hi Dadoonet sir,

I have one more query i want to update data in elastic search , what if data to update is 1 Million and which api to use and tools (Logstash) or any other tools.

It depends on what tool you used at first I believe.

I am using Logstash with jdbc an it is on windows server.

Then use the same tools to update. Note that if you are going to update a lot of documents, it might be better to reindex the whole dataset instead.

Hi dadoonet sir,

Any documents links for Update API and Re-indexing document.

Updating a document is the same API as creating a document.
When I say reindex, I meant index again as you did the first time.

Hi dadoonet sir,

how to speed document insert into elastic search using logstash.

Can you give me some suggestion on it sir.

I am using JDBC plugin below is the code of it.

input {  
    jdbc {  
        jdbc_driver_library => "D:\sqljdbc_6.4.0.0_enu\sqljdbc_6.4\enu\mssql-jdbc-6.4.0.jre8.jar"  
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"  
        jdbc_connection_string => "jdbc:sqlserver://SAI-PC;user=sa;password=Pass$123;"  
        jdbc_user => "sa"  
        jdbc_password => "Pass$123"  
        statement => "SELECT * FROM [AdventureWorks2008R2].[HumanResources].[Employee]"  
    }  
}  
filter {}  
output {  
    stdout {  
        codec => json_lines  
    }  
    elasticsearch {  
        hosts => "http://localhost:9200"  
        index => "humanresources"  
    }  
}

In my experience, most of time is spent on reading the source database.
In that case, you can may be add a WHERE clause in your query to select only a subset of your documents and then run multiple logstash pipelines at once in parallel?