I thought that max_score returned with the results should always be in 0..1 range, yet I see results with max_score greater than 3. What does that mean?
There is no bound to the score that a document can have. It is not a sliding scale between 0 (not relevant) to 1 (completely relevant), mainly because there is no definition of what a "completely relevant" document would be. If I would searching for "quick fox", what would a completely relevant document look like? Maybe which just contained the text "quick fox"? But then what happens when I come across a document with the text "the quick fox is the only quick fox"? This should score more because it mentions "quick fox" more times.
The score is not on a fixed scale at all. Furthermore, score in different queries should not be compared together. If a document A gets a score of 2 when I query for "quick fox" and document B gets a score of 4 when I query for "brown horse" it does not mean that document A is more relevant to "quick fox" than document B is to "brown horse".
The only function of the score is to convey meaning in the relevance of a document to a query when compared to the scores of other documents returned by that same query in that same search.
You can read this chapter of the book 'Elasticsearch: The Definitive Guide' to get a better idea of how relevancy works in Elasticsearch (and Lucene): https://www.elastic.co/guide/en/elasticsearch/guide/current/controlling-relevance.html