Concepts suggester?

Hello,

I would like to implement a Google-like search with Elasticsearch like this example :

For example, if I have this collection of documents where the properties are stored in a "tags" field (an array) :

[
   {
      "title":"Some Example 1",
      "tags":[
         "tag1",
         "tag2",
         "tag3"
      ],
      "difficulty":"easy"
   },
   {
      "title":"Some Example 2",
      "tags":[
         "tag3"
      ],
      "difficulty":"normal"
   },
   {
      "title":"Some Example 3",
      "tags":[
         "tag3",
         "tag1"
      ],
      "difficulty":"normal"
   }
]

Which in Formal Concept analysis gives a table like :

Document tag1 tag2 tag3
Some Example 1 X X X
Some Example 2 X
Some Example 3 X X

I would like to have a Suggester that does a "Formal concept analysis" on the "tags" field of each document ( in addition to using filters on other fields (like the "difficulty" one) ).

After reading part of the Elasticsearch docs, the Adjacency Matrix Aggregation doesn't seem to handle this complex search algorithm. Do you have any alternatives ?

Thanks for your help :slight_smile:

I'm not too familiar with "formal concept analysis" but based on a quick skim of this description it relies on studying set intersections.
I think you should be able to use elasticsearch to discover much of the required stats efficiently.

The adjacency matrix can be used to describe intersections of arbitrary sets (you use filters to define the elements you want to intersect and each filter can be a single term or a more complex bool expression to combine multiple tags)
To get stats outside of intersections the significant_terms aggregation might be of use - it returns counts of terms intersecting with your query and also the "background" stats of uses outside of your query set.
If you can say more about what your starting query is and what you hope to discover that might guide the discussion further.

What I hope to discover is documents that are related to the same "concepts" ( an example in the programming world, if I search with the tags "for" and "while", the concept here is "loops" )

Concretely, my ideal query should have all these properties :

  1. have basic filters for fields that have known boundaries ( like the "difficulty" field )
  2. have a dynamic filter / suggest for the "tags" field (their domain is open) :

For example, at the beginning, I should get all the tags (or the 20 most used) in descending order of their use ( here, "tag3" , "tag1", "tag2" )

Then if I choose the "tag3" tag, I should the tags commonly used with it in descending order (with my given example : "tag1" and "tag2" ) so I can restrict the search.

  1. sort the documents according to their result with what is explained in point 2 ( most relevant documents first )

I think you just want to use the significant_terms aggregation on the tags field.
Here's the significant tags for a search on StackOverflow questions talking about loops/looping:

Kibana-35

If your query is fuzzy in any way (things match to varying degrees) then you probably want to use significant terms in conjunction with the sampler aggregation.

1 Like

Thanks for the clarification.
Can you illustrate that with a query for my given example ?

Something like this:

GET stackoverflow/_search
{
  "query": {
	"match": {
	  "title": "loop loops looping"
	}
  },
  "aggs": {
	"my_refinement_suggestions": {
	  "significant_terms": {
		"field": "tag"
	  }
	}
  }
}