Setting up tags for documents in collection with search by elasticsearch


(Oleg S) #1

I have collection in mongo with about 5 millions documents, number may increase to houndereds of millions. I have to implement searching and aggregation through this collection.

I installed elasticsearch with mongodb river for collection synchronization.

For searching I need to mark documents by different tags. For instance there is document which field user_agent contains string ios. This document must be marked by ios tag. Then thees tags will be used for aggregation and search.

Question is how to implement efficiently setting up tags for documents? ES doesnt support batch update by query (only with plugin but it works slow). Also source documents in MongodDB should also contain tags not only ES index for cases when recreating ES index is necessary.


(Mark Harwood) #2

It's best to fix this "on the way in" rather than thinking about indexing then batch updating later (an elasticsearch update is a delete followed by a re-index internally).
One approach would be to look at encoding the user-agent categorisation logic as part of the elasticsearch analysis pipeline using regular expressions [1] to mine out the bits of interest. This is a configuration exercise and so would probably not be too much disruption to the existing pipeline using the mongodb river.

However, you'd have to do all the hard work of figuring out user-agents and platforms etc and encoding as regex patterns. Thankfully several people have already done this e.g. [2] so it would probably make more sense to re-route your raw content through an enriching pipeline like this before insertion into elasticsearch

Cheers
Mark

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-capture-tokenfilter.html
[2] https://www.elastic.co/guide/en/logstash/current/plugins-filters-useragent.html


(system) #3