We actually cover this somewhat in Relevant Search. (dm me if you'd like a discount). We discuss the basics of concept search, and demonstrate solutions where you have a Taxonomy of ideas.
What your describing it sounds like you want to figure out how often say "tata" cooccures with "jaguar" and discern some relationship automatically. You want to learn that jaguar has a high affinity to tata. Elasticsearch doesn't do this out-of-the-box. But, it sounds like what you're looking at is some kind of concept search. Specifically, you may want to look into the general field of topic modeling.
You want to group related concepts together. I've written about Latent Semantic Analysis. But this is a rather old technique, the two other big techniques are word2vec and latent Dirichlet allocation. These, in my experience, give better general purpose results.
There's a couple ways of viewing the process. The output of such a process can be a set of term affinities for a document. In a sense they help "fill out" the semantic space of the document. You can stuff this into another field if you like.
There's a lot of considerations for good topic modeling/concept search, if you're serious about it I'd seriously considering talking to an NLP expert, picking up a good book, and doing lots of research. In my experience bad concept search can actually make things worse, so proceed carefully.
The book does look very interesting... would indeed be keen to purchase it...
As for this issue, we have the relationships in the database to show a parent brand (eg. tata) and its child brands (eg.jaguar, landrover), and i suppose one way is to stuff these child names into another field in the index and include this field in the search alongside the existing company name....
How do cater for this within elasticsearch itself i wasn't really sure....
That's great then @evvo. That's the best case scenario
So in the book, we recommend a couple of strategies for modeling concept search using taxonomies (chapter 4).
The most straightforward is to just expand any child concept to a parent concept using a synonyms filter. For example,
jaguar => jaguar, tata
land rover => land rover, tata
This works well with TFIDF scoring. A search for tata will get a lower TFIDF score as its less specific. A search for the more specific "jaguar" will have a higher TF*IDF score, as its more specific. This is due to how IDF works. IDF is lower for common terms (in this case "tata" will be more common) and higher for less-common terms ("jaguar").
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.