We have set up app-search self managed instance on a linux machine. Which is fine however it is crashing after few millions of records, below are the issues
We were able to write upto 14 mill records, after that the app-search is crashing with too many open connections errors. I have rebooted the server but no luck.
We are writing data into app search with .Net core application which is running on the same machine, the job does two parts
fetches data from a document based db -- upto to here it is very quick.
the second part is writing data to app-search.. which is taking too long.. more frustrated thing is it is processing only 100,000 records per day.
logstasher.log is growing very rapidly.. is there any way we can restrict this.?
We were able to write upto 14 mill records, after that the app-search is crashing with too many open connections errors. I have rebooted the server but no luck.
Are you able to provide us with a more specific error message?
which is taking too long.. more frustrated thing is it is processing only 100,000 records per day.
How are you writing the records into App Search?
logstasher.log is growing very rapidly.. is there any way we can restrict this.?
EDITED: Removed my advice in favor of @orhantoy's response below
Reg 1. The documents are indexed into Elasticsearch - have you looked into the health of your Elasticsearch cluster?
Reg 3. What version of App Search are you on, as I recall this being fixed not too long ago?
Sorry you're experiencing issues with indexing large amounts of data into App Search. I hope we can help you resolve those issues ASAP.
As Jason and Orhan has already pointed out, we may need a bit more information to troubleshoot your specific situation.
We were able to write upto 14 mill records, after that the app-search is crashing with too many open connections errors. I have rebooted the server but no luck.
This is concerning since we've never seen anything like that before and it'd be extremely helpful to get more information on the following:
What specific errors are you seeing and where?
Could you run the following command on the server running App Search when the issue is happening?
$ ps axuww | grep java | grep app-search
^ Run this and get the value of the process id
(a numeric value of the second column)
Then run the following and share the results with
us if possible (PROCESS_ID is the process if value from the previous command)
$ lsof -n -p PROCESS_ID
it is processing only 100,000 records per day
Ingest rates are extremely dependent on the code performing the ingestion. If you need to ingest a lot of data relatively quickly, you need to ensure you're using batch indexing requests (up to 100 docs in one batch), using multiple parallel processing requests (processes, threads, etc depending on the code that does the ingestion) and are monitoring the health of your Elasticserarch cluster to ensure it is able to keep up with the data you're pushing into it. I'm fairly certain App Search could handle a lot more in a day than the numbers you're seeing, so I'd recommend not settling for that number and looking into ways to dramatically increase it.
Is there any limit in the app-search
There are no limits on the number of records indexed into the system or on the ingestion rate. It all depends on available resources and proper sizing of components (mainly Elasticsearch).
Oh, and one more thing: if you are able to upgrade to the latest version (Enterprise Search 7.8.0 at the moment), please do – the product is relatively young and we've made tremendous improvements over the past few minor versions (like the logstasher log, that does not exist anymore and all other logs have a proper automatic log rotation configured).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.