How do I get Enterprise search to use my existing indexes?

neergttocsdivad · November 8, 2021, 4:18am

Elasticsearch + Kibana + Enterprise Search (7.15.1) all running in a docker container on a Synology NAS.

/volume2/docker

I used FSCrawler-some months ago to generated indexes which are stored in /volume4/Elasticsearch/lib & log

In the hope of something magical, I included the following lines in my docker-compose.yml (see below):

environment:
  - "path.data:/volume4/elasticsearch/lib"
  - "path.log:/volume4/elasticsearch/log"

Sadly, I have not found evidence of the indexes in App-Search or Workplace-Search.

Is there a way, or must I re-index the lot?

I have been unable to find an example of this situation anywhere.

Suggestion please.

# based on "https://www.elastic.co/guide/en/enterprise-search/current/docker.html"
# elasticsearch/environment
# - added "path.data:/volume4/elasticsearch/lib"
# - added "path.log:/volume4/elasticsearch/log"
# - increased "ES_JAVA_OPTS=-Xms512m -Xms512m" to 2g
# ent-search/environment
# - increased "JAVA_OPTS=-Xms512m -Xmx512m" to 2g
# - added crawler.security.ssl.certificate_authorities:none
# - added crawler.security.dns.allow_loopback_access:false

---
version: "2"

networks:
  elastic:
    driver: bridge

volumes:
  elasticsearch:
    driver: local

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.15.1
    restart: unless-stopped
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - "xpack.security.enabled=true"
      - "xpack.security.authc.api_key.enabled=true"
      - "path.data:/volume4/elasticsearch/lib"
      - "path.log:/volume4/elasticsearch/log"
      - "ELASTIC_PASSWORD=changeme"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic

  ent-search:
    image: docker.elastic.co/enterprise-search/enterprise-search:7.15.1
    restart: unless-stopped
    depends_on:
      - "elasticsearch"
    environment:
      - "JAVA_OPTS=-Xms2g -Xmx2g"
      - "ENT_SEARCH_DEFAULT_PASSWORD=changeme"
      - "elasticsearch.username=elastic"
      - "elasticsearch.password=changeme"
      - "elasticsearch.host=http://elasticsearch:9200"
      - "allow_es_settings_modification=true"
      - "secret_management.encryption_keys=[4a2cd3f81d39bf28738c10db0ca782095ffac07279561809eecc722e0c20eb09]"
      - "elasticsearch.startup_retry.interval=15"
      - "crawler.security.ssl.certificate_authorities:none"
      - "crawler.security.dns.allow_loopback_access:false"
    ports:
      - 3002:3002
    networks:
      - elastic

  kibana:
    image: docker.elastic.co/kibana/kibana:7.15.1
    restart: unless-stopped
    depends_on:
      - "elasticsearch"
      - "ent-search"
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
      ENTERPRISESEARCH_HOST: http://ent-search:3002
      ELASTICSEARCH_USERNAME: elastic
      ELASTICSEARCH_PASSWORD: changeme
    networks:
      - elastic

warkolm · November 8, 2021, 7:09am

While Enterprise Search uses Elasticsearch under the hood, you can't just import indices created via other approaches like that. You will need to ingest that data via the Enterprise Search APIs.

dadoonet · November 8, 2021, 7:32am

Yes.

You need to use the FSCrawler Workplace search output and reindex everything.

neergttocsdivad · November 8, 2021, 9:39am

OK. During and following the reindexing process, is there a location which may be backed-up to permit an efficient transfer of data location or reconstruction in case of data corruption?

Scrilling · November 8, 2021, 7:08pm

Hi David,

Here's a quick link to the FSCrawler docs on indexing directly into Workplace search.

I recommend checking out the Elasticsearch Snapshot/Restore APIs to back up data when performing unfamiliar actions.

neergttocsdivad · November 8, 2021, 9:21pm

Thankyou. I am grateful for the links.

If snapshots are primarily insurance when performing unfamiliar actions . . . yet it can take days, weeks or even months for fscrawler to index large corpora . . . . then, if data is lost or corrupted, does Elastic.co suggest or offer no solution other than re-indexing?

M. Pilato souhaitera peut-être offrir une autre réponse succincte et dévastatrice. Oui ou non?

warkolm · November 8, 2021, 10:48pm

Snapshots, ie backups, are the recommended approach.

neergttocsdivad · November 8, 2021, 11:03pm

No, they are not!

Scrilling wrote " I recommend checking out the Elasticsearch Snapshot/Restore APIs to back up data WHEN PERFORMING UNFAMILIAR ACTIONS."

OMG . . I just spotted this at the bottom of the Snapshot/Restore page recommended by Scrilling and Warkolm.

"WARNING - The only reliable and supported way to back up a cluster is by taking a snapshot . You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to Have succeeded having silently lost some of your data."

The entire Snapshot/Restore page is a loud caveat emptor (French AND Latin).

Would snapshots be practicable with a multi-terrabite index?
How long would it take to backup a multi-terrabite index?
How much data would potentially be lost before the next backup could start?
One could use many, concurrent, small indices, to increase the speed of a backup, but this wouldn't solve the total required backup capacity!
Or, would staggard, concurrent backups be the solution?
What prompted the warning to be on the page?

A new index is required for every Lucene upgrade.
As this appears to be a Lucene issue, does it also affects Solr?

So M. Pilato's answer would be, non.

warkolm · November 8, 2021, 11:30pm

Yes.

It depends on the size of the cluster and what repository you are using.

Lost in what sense?

Snapshots are not done in parallel.

People were copying the underlying Elasticsearch data directories, at a filesystem level, then assuming it'd work.

neergttocsdivad · November 8, 2021, 11:58pm

It's actually not so bad when one is aware of the issue.
Users with small indices can upgrade fairly painlessly, possibly using parallel systems which can be switched.
Users with large indices need to be more circumspect and upgrade only when the benefits of the upgrade outweigh the cost and disruption..

From your business perspective, it may be worth putting this information - together with the considerations and solutions - front and centre, rather than as frightening warnings on a peripheral page.

I will now stop being mean to Mr. Pilato and I'll stick to using English.

FYI. It took eleven weeks (24/7) to generate my 5TB indices. They are now useless. Time to start again.

dadoonet · November 9, 2021, 6:38am

Hey @neergttocsdivad

You called out my name several times:

M. Pilato souhaitera peut-être offrir une autre réponse succincte et dévastatrice. Oui ou non?

I'm unsure what are you expecting from me. BTW if you want to speak in french with me, you can do it in Discussions en français.
Or if you just want to clarify privately, you can DM me.

So M. Pilato's answer would be, non.

I don't get it.

I will now stop being mean to Mr. Pilato and I'll stick to using English.

Same.

Thanks.

Topic		Replies	Views
Newb question re: Enterprise Search Using Docker Elastic Search docker	5	290	November 4, 2022
Enterprise Search 7.x - Docker Compose example Elastic Tips and Common Fixes docker	1	2355	November 4, 2022
Enterprise Search - Docker Compose Elastic Search docker	3	1803	November 4, 2022
Enterprise Search is Unable to connect to ElasticSearch Elastic Search docker , elastic-app-search	10	862	July 7, 2023
Error creating web crawl index using Enterprise Search in Docker Elastic Search docker , elastic-site-search	2	217	January 31, 2024

How do I get Enterprise search to use my existing indexes?

Related topics