The click-log has a few fields in it but the two core fields of interest are the customer's query string and the product code they clicked on. Product code is just a simple long field and the query string is, of course, a string. To avoid the query string e.g. dj mixer being broken into the tokens dj and mixer we index it as not_analyzed. I used the "raw" naming convention in a multi-field mapping [1] to have this untokenized form of the query string.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.