I'm just getting familiar with App Search and was hoping to use it in the context of using it to create quick and easy search forms to our backend of substantial records, about 3 billion across 700+ indices for our various data products and releases.
I noticed right off that there wasn't really anything documented for loading vast quantities of data into App Search indices beyond various forms of loading small JSON files. To be honest, I was a little surprised it didn't allow you to attach to existing indices.
This seemed very inefficient and a bit short sighted. I did figure out a way to get around this, albeit with a little 'processing'.
Does anyone see any landmines using the following approach. Assume all 'source' indexes are fully mapped and solid.
Create 'Engine'
'GET' content from source, select 1 record in JSON format
remove all system fields if present, such as @version, @timestamp. I also set a value in 1 field to uniquely identify this record for later deletion (otherwise you'd have a duplicate)
Paste into 'Paste JSON' option for 'Index documents' -- This will create the target index in elasticsearch
Clean up field types in 'Schema'
Verify target index name using: GET _cat/indices/.ent-search-engine-documents-{engine name}
Validate using: GET .ent-search-engine-documents-{engine name from _cat}
Use _reindex api to ingest records:
POST _reindex?slices=auto&refresh
{
"source": {
"index": "{source index}"
},
"dest": {
"index": "{engine name from _cat}"
}
}
Take my input here with a grain of salt as I am very new to Elasticsearch. I was faced by the same issue: moving in 1 million plus records in App Search Index. My source is GCP BigQuery, and I selected Elasticsearch because of the GCP Marketplace integration running DataFlow jobs to export in batches. You can connect to App Search indexes, but it's not officially supported, and I ran into trouble with schema at first. While records were appearing, I shied away from this approach since Elasticsearch App Search indexes are only officially updated by the App Search API.
I had asked supported about importing the Elasticsearch then migrating from ES indexes to a App Search index and they said it wasn't supported. I ended up writing a Python CloudRun process which imported via App Search API and in 100 record segments, but did the processing in chunks of 50,000. I had trouble getting the counts I needed so created an import batch group for each set of 10,000 and assigned a guid to each along with the natural key of the data set to gain as many possibilities of querying records in case of failure.
I wish I could comment on the reindex idea, it sounds like it could work. Wish I could ad more other than there's the caveat that it seems some operations, while they may work, are not supported. I am curious as to the outcome.
I chose App Search as I needed some short cuts, but as always with racing ahead sometimes you get tripped up later on.
It actually worked great with the 7 million record test index (I've got about 3.5B records across about 700 indices to think about). Something to keep in mind
Most awesome! Hmmm ... it's got me thinking. I'm curious as to why Support doesn't off that as an approach. I know it would have saved me a few days of tinkering, as the Elasticsearch API is geared towards that high precision streaming. App Search with 100 doc batch is really a weak spot.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.