Indexing Jupyter Notebooks

I am trying to index, jupyter notebooks in raw format. Jupyter notebooks are pretty much the new line delimited JSON. I need help in ignoring some of the fields which only indicates the type of markdown.

PUT _template/ispel

{
"aliases" : {
"my_alias" : {}
},
"mappings" : {
"cells" : {
"properties" : {
"source" : {
"ignore_above" : 256
}
}
}
},
"settings" :{
"number_of_shards" : 1,
"number_of_replicas" :2
}
}

This is my code for indexing and below is a sample for ndjson
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[//]: # (Long-Title: Subsets; Universal Sets, Proper Subsets, Improper Subsets, and Set Equality)\n",
"\n",
"[//]: # (Short-Title: Subsets)\n",
"\n",
"[//]: # (Keyword: Subsets)\n",
"\n",
"[//]: # (Keyword: Subsets of Universal Set)\n",
"\n",
"[//]: # (Keyword: Proper Subsets)\n",
"\n",
"[//]: # (Keyword: Strict Subsets)\n",
"\n",
"[//]: # (Keyword: Improper Subsets)\n",
"\n",
"[//]: # (Keyword: Set Equality)\n",
"\n",
"[//]: # (Keyword: Subset notation)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Subsets\n",
"\n",
"Consider the following two sets:\n",
"\n",
"Alphabet={a,b,c,.....z};vowels={a,e,i,o,u}\n",
"\n",
"The set vowels contains a subset of the elements of the set Alphabet.\n",
"\n",
"vowels is a subset of Alphabet since all the elements of vowels are also in Alphabet.\n",
"\n",
"The mathematical notation for indicating that Vowels in a subset of Alphabet is: \n",
"\n",
"Vowels \\subset Alphabet\n",
"\n",
"Example:\n",
"\n",
"D is the set of decimal digits; D={0,1,2,3,4,5,6,7,8,9}\n",
"\n",
"B is the set of binary digits ; B={0,1}\n",
"\n",
"Since all the elements in B are also in D, B is a subset of D B$\subset$D\n",
"\n",
"Example:\n",
"\n",
"H is the set of hexadecimal digits.\n",
"\n",
"H={,1,2,...9,A,B,C,D,E,F}\n",
"\n",
"O is the set of Octal digits\n",
"\n",
"O={0,1,2,3,4,5,6,7}\n",
"\n",
"Is H$\subset$O?\n",
"\n",
"Since not all elements of H are in O, H is not a subset of O. The corresponding mathematical notation is\n",
"\n",
"H$\not\subset$ O\n",
"\n",
"On the other hand,O is a subset of H:\n",
"\n",
"O$\subset$H\n",
"\n",
"The relational operator \\subset indicates proper subset strict subset relationship.\n",
"\n",
"When we write O$\subset$H, it means that O is contained in H, but O is not equal to H.\n",
"\n",
"On the other hand, when we write A$\subseteq$B, it means that A is a subset of B but may also be equal to B.\n",
"\n",
"Example \n",
"\n",
"Let A={1,2,3,4}. Since all the elements in A are also in B. It is also true that B$\subseteq$A.\n",
"\n",
"The \\subseteq notation gives us a way to define set equality.\n",
"\n",
"Two sets A and B are equal if all the elements of A are in B, and all elements of B are also in A.\n",
"\n",
"In other words, A=B since A$\subseteq$B and B$\subseteq$A.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": ,
"source":
}
],
"metadata": {
"celltoolbar": "Edit Metadata",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.