Getting data into Elasticsearch?


(T.T. Nguyen) #1

Hello! This is my first large project in working with aggregating and searching data. I understand how to use Elasticsearch querying, however I'm unsure of how to actually give it the data I want.

I'm looking to index data from various sources (Confluence, GitHub, Google Docs, box, etc.). So far, I learned that I can get data in JSON format through their respective APIs, but how would I go about actually scripting the indexing through http(s)? While testing how Elasticsearch would work on the server, I had to access the API call in-browser to retrieve the JSON data, then copy that into the -XPOST call for indexing in Elasticsearch. There's definitely a better way to do this, haha

I'm proficient in Java and Ruby, as well as HTML, JavaScript and PHP. My end-goal is to be able to search for documents from those various services!


(Colin Goodheart-Smithe) #2

There are many official language clients for Elasticsearch includes clients for Ruby, JS, PHP and Java. Take a look at the documentation for your preferred language clients in the following link (under the clients section): https://www.elastic.co/guide/index.html

Hope that helps


(T.T. Nguyen) #3

Thanks for the reply!

I've looked through some of the client documentation, and I guess I'm looking for a recommendation for starting this project. My first goal is to be able to index documents using Confluence's API (with HTTP(s) calls) but I'm not sure whether it's easier to do with Logstash+plugins or with the Java client for this particular step.


(Colin Goodheart-Smithe) #4

I think there is a certain amount of personal preference here. I think you could do it with both approaches but I haven't personally used the http input for Logstash so I don't have an opinion on which way would be easier. Sorry.

Maybe other people will be able to comment


(Mark Walkom) #5

I'd try both and see what works best for you :smile:


(T.T. Nguyen) #6

Thanks everyone! I've tried both the PHP and Ruby clients and found that documentation really pushed me towards the former. It's working out with multi_match, and the conversion from JSON array to PHP code (or reverse) is simple enough! I can send queries and put the hits on the web page!

My next step is to aggregate data from those various sites now. Has anyone tried the Elasticsearch River App to crawl for content? I've looked into some threads here that discusses web crawlers, but the last comment was from four years ago and the suggestions there are probably not the best or current.


(Mark Walkom) #7

Rivers are deprecated anyway, so it's not worth using them.


(system) #8