Indexing DNA sequences


(Youdong Zhang) #1

Hi all,

The current data that I am working with includes DNA_sequence as one of its Json fields.
Currently, I am not able to search subsequences of the DNA sequence, and I suspect es turned off analysis for this field because it was so long and void of spaces.
How would I go about allowing subsequence searches for this field?

These subsequences should be no longer than 2000-5000 characters
Thank you!


Indexing 5GB file
(David Pilato) #2

@polyfractal did a demo / presentation about this. You can find the slides here: http://fr.slideshare.net/ZacharyTong/boston-meetupgoingorganic

I'm sure he can help here if needed :stuck_out_tongue:


(system) #3