Cannot connect to AWS elasticsearch cluster with flask app and Python elasticsearch wrapper

This is a repeat of an earlier request but with a lot more information.

I have installed elasticsearch 6.5.3 on an aws ec2 server. Inbound rules are:

SSH TCP 0.0.0.0 PORTS RANGE 22
HTTP: TCP 0.0.0.0 PORTS RANGE 80
HTTPS TCP 0.0.0.0 PORTS RANGE 443

In the /etc/elasticsearch/elasticsearch.yml file the netwokr.host is as follows:

network.host: 0.0.0.0

otherwise the file has not been changed.

The Python file is:

from elasticsearch import Elasticsearch

application = Flask(__name__)
es = Elasticsearch([{'host': 'http://ec2-54-145-137-99.compute-1.amazonaws.com', 'port': 9200, 'use_ssl': False}])

@application.route('/', methods=['GET', 'POST'])
def index():
	es.indices.create(index='my-index', ignore=400)
	
print(es.cluster.health())

if __name__ == '__main__':
    application.debug = True
    application.run()

I have tried multiple configurations in the elasticsearch.yml file in /etc/elasticsearch/ including:

network.host: 0.0.0.0 (current configuration)
network.host: my ec2 ip address
network.host: 127.0.0.1
network.host: full ec2 DNS

I have also tried multiple configurations in the application.py file including:

es = Elasticsearch("http://54.145.137.99")  
es = Elasticsearch()
es = Elasticsearch([{'host': 'https://ec2-54-145-137-99.compute-1.amazonaws.com', 'port': 443, 'use_ssl': True}])
es = Elasticsearch([{'host': 'http://ec2-54-145-137-99.compute-1.amazonaws.com', 'port': 9200, 'use_ssl': False}]) (current)

The error message has variously been

connection timed out or connection refused

This must be something that a huge number of peole do successfully but I cannto find a solution anywhere. I have googled many, many times and I have seen threads describign the problem but no solution.

Can somebody please help me? It's extremely frustrating! Thank you.

Elasticsearch by default listens to port 9200, which you do not seem to have open for inbound traffic. If you are opening this on a public IP address, make sure you are securing Elasticsearch so not anyone can get to it.

Thank you, I have now fixed this and I hope the following solution will help others.

Open new port on EC2 instance as follows:

CustomTCP TCP 9200 0.0.0.0

and in the application.py file the connection string is as follows:

es = Elasticsearch("http://00.000.111.22") //found on EC2 dashboard

Thank you.

Make sure you are not exposing your cluster on internet please

I realize this is important. How can this be done while keeping it open for a web app which allows a) users to input stuff and b) search?

Your advice will be most welcome.

How can this be done while keeping it open for a web app which allows a) users to input stuff and b) search?

Some options:

  • Consider Elasticsearch the same way you would consider your SQL database. Do you you really need the SQL database to be exposed to internet or only to the machine which is running the webapp? Same for elasticsearch. Only expose it to the machine that needs to access it.
  • Secure elasticsearch using elasticsearch security (commercial license needed)
  • Use elasticsearch as a service available from https://cloud.elastic.co. Easier to deploy and maintain, contains everything you need including security.

Hope this helps.

I m sorry but the distinction you are making is not clear to me. Users will need to access the database to carry out searches (like wikipedia for example) and, if it was a blog to enter and edit data. So machines all around the world need access. How can secure it to stop them from accessing data in ways I don't want them too? Only with an ssl connection and the use of user and pass credentials for deleting or modifying some data. Noted es as a service but I would like to understand this anyway.

I don't know what your application is but imagine that you have an application layer.

Basically

USER -> APPLICATION -> ELASTICSEARCH

Where APPLICATION is your blog application or wikipedia application or whatever.
So the user does not have a direct access to ELASTICSEARCH.

But if you really want to open access to your elasticsearch instance for all users it basically means that you want this:

USER -> ELASTICSEARCH

Which means that you will share with all users the login and password... Would you want to do that?

You can imagine securing elasticsearch with different credentials, like one write role and one read only role. In which case you would have a design like:

# Write time
USER -> APPLICATION -> ELASTICSEARCH

# Read time
USER -> ELASTICSEARCH

But in that case, I'm not sure why you would not use the APPLICATION layer at search time as well...

Also if you are just after indexing your public website, you may want to look at https://www.elastic.co/solutions/site-search

HTH

thank you. This is at a level of abstraction which is too high for me. We agree on the principles, the problem for me is the steps involved in making it work. I need to read the documentation more thoroughly but, as I am not a computer scientist, some of it which would be clear to others is not always clear to me.

So the easiest path for you would be IMO to use a service which is totally ready to use:

thank you. Can you at least answer my first question?

How can I have an ES cluster that is accessible from the web (for people to search, say) that is otherwise secure?

Is opening port 9200 a big mistake? Should I, is it possible even, to make the http.host (in the config file) 443) for example and only connect over ssl?

Use Elasticsearch security feature, use cloud.elastic.co, add a nginx on top of your cluster....

Is opening port 9200 a big mistake?

Yes!

If you want users to be able to access Elasticsearch directly from browsers you will need to secure your cluster and limit what users can do once they are logged in. As you will need to share credentials with the users, you have to assume that anyone can get hold of them and access the cluster using these credentials. You therefore need authentication as well as role-based access controls so you can create a user that is only allowed to read and not modify data.

As this is hard to secure, most users have an application layer in between the user and Elasticsearch to control access.

Yes, this will open up your cluster to the entire internet so that anyone can access it. Using SSL will not help as you still do not require any login and can not control what the user is allowed to do.

I am relatively lost here, I don't understand what you mean by the "application layer", sorry to be dumb about it.

If you describe the structure of your application it might be easier for us to put things in context.

thank you. 2 Situations:

  1. a web app which allows users to search rental properties in London. I will put the data in so the only thing that users need to do is search. It is of course unrealistic to think I can put in all the data by myself, this is just an example. Users do not need to create an account or login. They just search like they would with Wikipedia.

  2. a real time contract bridge scoring system. Multiple users enter data for about 6 fields into the cluster via mobile web site or iOS app. So, for example user enters data through a form on a mobile web app.
    These are two use cases.

It is of course unrealistic to think I can put in all the data by myself.

That's the point... You have an application somewhere which is managing those data.
This application can be the one which is calling elasticsearch when a user search for anything. So the user connects to the application which asks elasticsearch.

USER -> APPLICATION -> ELASTICSEARCH

They just search like they would with Wikipedia.

When you search in Wikipedia, you are not directly searching in elasticsearch.

Instead you are calling api.php which is calling then elasticsearch and the application (the API here) is sending back the results to the browser:

a real time contract bridge scoring system. Multiple users enter data for about 6 fields into the cluster via mobile web site or iOS app. So, for example user enters data through a form on a mobile web app.

Typically what I would solve with app-search. But again, you don't want to have your users access directly Elasticsearch but to connect to your application which will call then elasticsearch. Exactly what app-search is doing BTW.

1 Like

I am sorry I just do not get what the layering distinction is that you are making. (I understand it in abstraction) but I have no idea what it means in practice. In my mind (in the second case) the application would be a python flask app which handles the user input and then puts it into the ES cluster, does some stuff and then reports the results back to the browser. Is this what you mean?

If that is correct then my original question still exists, how can my flask app connect, through the Python wrapper, to the ES cluster? That is where I am stuck.

Yes.

how can my flask app connect, through the Python wrapper, to the ES cluster?

I'd use the Python client: Python Elasticsearch Client — Elasticsearch 8.0.0 documentation
But may be it's a wrong answer. I don't know flask and Python.

the point of my difficulty is that I could not get a remote connection to my cluster, using the Python wrapper, without opening port 9200. That is the entire purpose of my original question. We have been speaking at enormous cross purposes. I have read the documentation you sent me but it does not answer my question.