I'm currently working on procurement data relevant to the current COVID-19 crises. Essentially i had a block of data that i'd like to be filtered to be relevant to COVID via a keyword list.
If a document does not contain one of these keywords or key phrases in any value then i'd like it to be excluded from my data set.
Whats the best way to do this? I have been looking into elasticsearch filtering but it seems you have to specify the field you're filtering by? using the KQL search bar seems to crash with any more than a few dozen OR terms.
Assuming you want to explore your data in Kibana's Discover:
You could switch KQL to Lucene (right to the query input).
Then the query could be a space separated list of your keywords:
"keyword1 keyword2 keyword3".
Your documents will be filtered by those keywords and only documents which include at least one of those keywords in any of searchable fields will be shown.
Any help with this? thanks i really appreciate it. An error:
type":"illegal_argument_exception","reason":"The length of regex [1005] used in the [query_string] has exceeded the allowed maximum of [1000]. This maximum can be set by changing the [index.max_regex_length] index level setting."}}}]},"status":400
To do this in kibana:
Management -> Elasticsearch -> Index Management -> Pick Index, Manage -> Edit index settings -> Edit settings -> add there "index.max_regex_length": "<new value more then 1000> -> Save
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.