I was checking "Google" but I didn't find too much info about it, so any help would be greatly appreciated.
I was working with Google Search Appliance before but it was decommissioned, and Elastic seems to be the best option on the market right now.
I need to:
- Crawl & index a number of websites (around 50)
- Serve them as an XML to a web application
The current version of Elasticsearch does not have a crawler? I need to install something else?
I am using the Elastic Cloud in AWS.
*We will crawl only public websites, like Twitter accounts etc.
https://swiftype.com/ can do most of that!
Swiftype is ok, but unfortunately we need to have more or less a real-time crawling (every 5 minutes) and their solution (the cheap one with <100$/month) offers only 1 crawling every 3 days or so.
I was checking now some other solutions like: 80legs.com and if I find something I will post here.
Can ElsticSearch be used for this type of live crawling, indexing & serving?
Elasticsearch can be the backend for storing the data collected from crawlers, but it has no crawling capabilities.
May be you can have a look at https://github.com/DigitalPebble/storm-crawler it has some integration with ES. (I haven't used it myself though.)
@warkolm Yes.. I was a bit shocked to see that there is no official crawler (at least on the Cloud version).
It's like selling only the engine and some other parts of a car , but you need to find the wheels by your own.
I guess I was too accustomed with the Google Search Appliance.
I will keep researching and post here whatever solution I find!
@lukas_vlcek Thanks! I will test it.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.