Suitability of Elastic for web search solution


(Mark Radford) #1

Hello, Elastic community!

I'm looking into the suitability of various search technologies as a replacement for a client's current Google Mini search implementation. Note, this is for a website search solution, so it's pretty "general", rather than being part of a more sophisticated application.

I'm trying to figure out whether or not Elastic search would be a good solution and have a few questions I'm hoping someone can shed some light on:

It seems that Beats are the way to index documents. Does anyone know of a Community Beat that either parses URLs (ie something like a web crawler) or will read HTML and other assets from a file system? It looks like FS Crawler might do the trick, but I'm wondering if anyone has any other suggestions?

We'll be wanting to add filtering based on a taxonomy we're applying to the pages. Previously we've done this by pulling out integers from a tag in the document. I'm assuming this would need to be part of the Beat we use to index the data?

We're hoping we can then use the taxonomy meta-data for faceting our results. In other words, if we've performed a search, we'd like to know the subset of taxonomy that applies to that result-set. Is that something Elastic search can handle? I'm assuming Aggregation would provide what I'm after, but I'm getting a bit muddled. Does Aggregation help provide that sort of information?

The collection size isn't massive - we're looking at about ~20k documents (a mix of HTML, PDF and Word documents mainly). Are there any suggestions on cluster size if we were to go for a hosted / managed Elastic Cloud solution? Any thoughts on how long it would take to reindex an Index of that size (I expect it's going to largely depend on the type of Beat used, but any thoughts are welcome).

I hope this is the right place to post this - feel free to suggest another forum Thanks in advance!


(Mark Walkom) #2

Use FSCrawler.

Yep.

You need to test this yourself really. It depends on your documents, analysis type, queries etc.


(Mark Radford) #3

Thanks very much Mark.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.