Question about Elasticsearch schema and query

I am setting up an Elasticsearch cluster for searching vectors associated to an id.

For example,

Given data:
Parent id / Object id / vectors
P1 / BD / 123, 125, 235 ... 10304, 50305 
P1 / DF / 125, 235, 240 ... 10305, 10306
P1 / ED / 123, 235, 350 ... 10010, 10344
... 
P2 / AB / 125, 535, 740 ... 9315, 10306
P2 / VC / 133, 435, 350 ... 3010, 20344
P2 / RF / 113, 353, 390 ... 10110, 30344
...
There are millions of parents
hundreds of objects in a parent
1000 vectors in an object

So basically I want to

  1. index all of the vectors
  2. given input P999, search for similar parents from the cluster by finding the most number of similar objects. (similar objects: at least 50 vector matches)

Here's a sample result I expect

Input:
P999 / HH / xxx, xxx ...
P999 / YH / xxx, xxx ...
P999 / GJ / xxx, xxx ...
...
Output:
[result sorted desc] 
P20 has 60 similar objects
P4 has 45 similar objects
P501 has 41 similar objects
...

similar objects: at least 50 vector matches

To achieve this,
I need

  1. Good schema
  2. A query that stores vectors
  3. A query that searches a list of similar objects in desc order

And I need some helps on these three.

  1. Schema
curl -XPOST url/vectors -d '{
  "mappings" : {
    "properties": {
      "object_id":{"type":"text"},
      "parent_id":{"type":"text"},
      "vectors":{"type":"text"}
    }
  }
}'
  1. insert query
curl -XPUT url/vectors -d '{
  "parent_id":"P1",
  "object_id":"BD",
  "vectors":"123, 125, 235 ... 10304, 50305"}
}'
  1. search query
curl -XGET url/vectors -d '{
  "size":10000,
  "query" {
    "function_score": {
      "functions": [
        {
          ???        
        }
      ],
      "qurey": {
      	"bool": {
      	  "should": [
      	    { "terms"{"vectors":["111"] },
      	    { "terms"{"vectors":["222"] },
      	    ...
      	    { "terms"{"vectors":["333"] },
      	    { "terms"{"vectors":["444"] }
      	  ]      	
      	}      
      },
      "minimum_should_match": "50",
	}
  },
  "from": 0,
  "sort": 
  [
    {
      "_score": {
        "order": "desc"
      }
    }
  ]
}'

And my questions are

  1. In my schema mapping, is this a right way to store vectors?
  2. In my search query, I need some help on [???] part to get the expected results. And I am not even sure I am on the right track. Would you correct my query if wrong?

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.