In the above JSON, the CustomerID in _source.m is unique for every customer. However, since the CustomerID is not a separate field, it is not possible to get the number of distinct customers using the unique count metric in Visualize.
Is there any way to get the number of unique customers without changing the Logstash filters? (as we have already processed a huge amount of data using the same Logstash grok filters)
Using this approach, I am now able to get the unique number of customers. However, in order to get this result, I had to make the following change in elasticsearch.yml in all my Elasticsearch nodes. This also involved restarting all the elastic nodes. As the restart didn't affect my local test environment, is it recommended to add the below change and restart the nodes in production? Also, will it affect the existing Elasticsearch data in any way?
script.painless.regex.enabled: true
My logs are indexed based on the date, with a new index being created each day. Hence, I would have to execute this query manually everyday in order to achieve the same result.
Is there a better way to approach the issue?
I have also faced a problem when I tried executing the POST request without the "query" field, where it went through all the documents (~1,300,000). This resulted in a timeout error 503. Whereas, when I added the query field, it only had to parse through and update around 200 documents. My concern is that there might be a possibility of a timeout even with the query field due to a large number of documents in production. Is there any way to overcome this issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.