Project advice (mapping, analysis, basic architecture )

Good Afternoon,

So I've been dealing with a parsing issue which I was originally trying to resolve by removing text with regex on further consideration I've decided to reach out for advice which might make my life easier moving forward. My intent in this post is to ask some very specific questions on mapping and then a few broader question on techniques, tips, and tricks. So for a little background I'm working on a content recommendation engine which takes user history, user input, and historical interactions into account to bring users the most relevant information possible. I've pretty green with the use of elastic search so allow me to apologize for incorrect use of terminology, features, or basic concepts. With that being said your suggestion and advice are greatly appreciated.

index/type1, index/type2, index/type3

I've had minor success with this portion however I have to manually copy and paste the collection of key value pairs into kibana however I have not been able to have the system parse the documents on it's own. I've attempted to clean up the documents with regex and have had limited success with that. So I've determined I have to do something about the mapping which will allow me to input said document in array format. The articles/courses are seperated by the },{ marking the difference between key pairs and key1 pairs.

[
 {
   "key": "value",
   "key": "value",
   "Key": "value"
},
{
  "key1": "value1", 
  "key1": "value1",
  "key1": "value"
 }
]

So I'm assuming at this point that my mapping needs to be able to support something along these lines but I'm having difficulty supporting it dynamically. What is the generally accepted method to indexing a document in this basic format?

{
  "type1": [
            {
             "key": "value",
             "key": "value",
             "Key": "value"
            },
            {
             "key1": "value1", 
             "key1": "value1",
             "key1": "value
            }
          ]
 }

This pattern basically repeats itself with the other type of documents that need to be parsed, I'm hoping that by organizing the different types under the same index it will allow me easily search all 3 types of documents with something like this index/_search?q='.....', furthermore I need to be able to do some analysis on the individual documents fields and add the results to the user profile. I'm hoping to accomplish this using the tokenize feature with the help of some stop words to remove unhelpful words.

So the proposed complete process would be something along these lines,

"user opens article" -> "query for and return tags (current document)" -> 
"add tags to user profile (current document)" -> 
"query document in collection off user profile tag collection (suggested content)" -> 
"return top docs of different types (suggested content)"  

This is the minimum of what I would like to accomplish, some of the other feathers I would like to include

    • basic text based search query (search bar)
    • the autocomplete feature
    • filter methods which will include or remove specific subjects based on user input

Now please correct me if I'm incorrect here but for the filtered document types I think I need a to do some kind of bucket aggregation? I suppose I will also need to include some control variables which will allow me to manually control the score of specific documents to either boast or lower their respective score. Which feature would be best to allow the ability to accomplish that? Would there be something I can do at this point which will also give me the ability to easily and rapidly add other documents into the analysis portion of the recommendation engine?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.