I'm making an application that shows you all the possible values for any field, so if you're searching for "type", it'll show you "cake" and "pie". If you're searching for "name", it'll show you "tiramisu", "black forest", etc..
Note that there's a lot of duplicate data here. "cake" is repeated multiple times. All I really need to store is a set of strings for each field.
I'm not sure if I'm trying to force Elasticsearch onto my problem or I'm not seeing how to use Elasticsearch correctly.
Depends what the problem is. Elasticsearch will do this no problems, it also does compression on fields (but not deduplication), so it'll store things efficiently.
Thanks for your reply. The problem is implementing picklists with Elasticsearch without storing all the data. The example given was just a few examples, but the real data will have tens of millions of entries and there will be a lot of duplicates for a specific field and there will be dozens of fields per record, so I don't want to store the source for each individual record. Does that make sense?
However even with tens of millions of records I don't think there's any problem here. If you are so resource restricted that you cannot store this data, then look at setting the field mappings to "store": false.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.