Suitability of Elastic for web search solution

mradford · January 13, 2017, 2:40pm

Hello, Elastic community!

I'm looking into the suitability of various search technologies as a replacement for a client's current Google Mini search implementation. Note, this is for a website search solution, so it's pretty "general", rather than being part of a more sophisticated application.

I'm trying to figure out whether or not Elastic search would be a good solution and have a few questions I'm hoping someone can shed some light on:

It seems that Beats are the way to index documents. Does anyone know of a Community Beat that either parses URLs (ie something like a web crawler) or will read HTML and other assets from a file system? It looks like FS Crawler might do the trick, but I'm wondering if anyone has any other suggestions?

We'll be wanting to add filtering based on a taxonomy we're applying to the pages. Previously we've done this by pulling out integers from a tag in the document. I'm assuming this would need to be part of the Beat we use to index the data?

We're hoping we can then use the taxonomy meta-data for faceting our results. In other words, if we've performed a search, we'd like to know the subset of taxonomy that applies to that result-set. Is that something Elastic search can handle? I'm assuming Aggregation would provide what I'm after, but I'm getting a bit muddled. Does Aggregation help provide that sort of information?

The collection size isn't massive - we're looking at about ~20k documents (a mix of HTML, PDF and Word documents mainly). Are there any suggestions on cluster size if we were to go for a hosted / managed Elastic Cloud solution? Any thoughts on how long it would take to reindex an Index of that size (I expect it's going to largely depend on the type of Beat used, but any thoughts are welcome).

I hope this is the right place to post this - feel free to suggest another forum Thanks in advance!

warkolm · January 16, 2017, 12:00am

Use FSCrawler.

Yep.

You need to test this yourself really. It depends on your documents, analysis type, queries etc.

mradford · January 16, 2017, 10:20am

Thanks very much Mark.

system · February 13, 2017, 10:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Crawling Elastic Community and Ecosystem	7	3327	December 11, 2017
Gigablast as an ElasticSearch Alternative Elasticsearch	1	994	July 6, 2017
Who's thinking of using Elastic Search Elasticsearch	3	365	July 6, 2017
Google Drive Beat Elasticsearch beats-module	2	418	September 19, 2019
Website crawl and index into Elasticsearch Elastic Community and Ecosystem	4	2211	October 24, 2017

Suitability of Elastic for web search solution

Related topics