Elasticsearch query for documenting containing OR along with AND

I'm new to elasticsearch and am doing a bit of query basics.

I have to retrieve count of those documents that: Either contain netflow.src_port ['10','11','12'] OR netflow.dst_port ['20','21','22'] AND timestamp being within the last 15 minutes.

I have a basic query as below:

body: {
    query: {
      bool:{
        should: [
          {terms: { 'netflow.src_port': ['10','11','12'] }},
          {terms: { 'netflow.dst_port': ['20','21','22] }},
        ],
        must: [
          {range : {
            "@timestamp" : {
              "gt" : "now-15"
            }
          }}
        ]
      }
    },
  }

Going by the official documentation, should is used to note that: a document need not contain src_port or dst_port in that range, but if it does, then calculate cost accordingly.

What I actually need is an or condition for them.

If my explaination is not clear, the below might convey what I'm trying to achieve:

if( (netflow.src_port contains [10 or 11 or 12] || netflow.dst_port contains [20 or 21 or 22]) && timestamp is within the last 15m.

What exactly do I need to do to get the desired result?

You have two top level clauses that both need to be satisfied -

  • MUST be the right time
  • MUST be the right IPs

So these are 2 clause objects in a MUST array.
The first clause is a range query.
The second of these clauses has a list of IPs from which to choose so needs wrapping in a bool query that lists the choices in a should clause:

{
  "query": {
	"must": [
	  {
		"range": {
		  "@timestamp": {
			"gt": "now-15"
		  }
		}
	  },
	  {
		"bool": {
		  "should": [
			{
			  "terms": {
				"netflow.src_port": [
				  "10",
				  "11",
				  "12"
				]
			  }
			},
			{
			  "terms": {
				"netflow.dst_port": [
				  "20",
				  "21",
				  "22"
				]
			  }
			}
		  ]
		}
	  }
	]
  }
}

Thanks for the quick reply Mark. I appreciate the help. but something regarding the documentation is tripping me.

According to the documentation:

The difference comes in with the two should clauses, which say that: a document is not required to contain either brown or dog, but if it does, then it should be considered more relevant

Applying the same logic here, does it not mean that:

The document need not contain either of the IP ranges, but if it does contain, then boost it's score. ==> which translates to the query picking up those docs that contain neither of the values in the range.

I'm sure I'm mistaken here, but could you please explain?

Thanks.

That page from the guide isn't telling the whole story there. Looking at the reference docs for bool query there's this added wrinkle:

If the bool query ... has neither must or filter then at least one of the should queries must match a document for it to match the bool query

My inner bool query only has a should parameter which means that at least one of the listed should clauses has to match.

Oh! understood.

So it's safe to assume that:
Must <--> AND
should <--> OR

Thanks for the solution!

For efficiency, also consider filter instead of must, if you don't need the scores.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.