How do I get Enterprise search to use my existing indexes?

Elasticsearch + Kibana + Enterprise Search (7.15.1) all running in a docker container on a Synology NAS.

/volume2/docker

I used FSCrawler-some months ago to generated indexes which are stored in /volume4/Elasticsearch/lib & log

In the hope of something magical, I included the following lines in my docker-compose.yml (see below):

environment:
  - "path.data:/volume4/elasticsearch/lib"
  - "path.log:/volume4/elasticsearch/log"

Sadly, I have not found evidence of the indexes in App-Search or Workplace-Search.

Is there a way, or must I re-index the lot?

I have been unable to find an example of this situation anywhere.

Suggestion please.

# based on "https://www.elastic.co/guide/en/enterprise-search/current/docker.html"
# elasticsearch/environment
# - added "path.data:/volume4/elasticsearch/lib"
# - added "path.log:/volume4/elasticsearch/log"
# - increased "ES_JAVA_OPTS=-Xms512m -Xms512m" to 2g
# ent-search/environment
# - increased "JAVA_OPTS=-Xms512m -Xmx512m" to 2g
# - added crawler.security.ssl.certificate_authorities:none
# - added crawler.security.dns.allow_loopback_access:false

---
version: "2"

networks:
  elastic:
    driver: bridge

volumes:
  elasticsearch:
    driver: local

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.15.1
    restart: unless-stopped
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - "xpack.security.enabled=true"
      - "xpack.security.authc.api_key.enabled=true"
      - "path.data:/volume4/elasticsearch/lib"
      - "path.log:/volume4/elasticsearch/log"
      - "ELASTIC_PASSWORD=changeme"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic

  ent-search:
    image: docker.elastic.co/enterprise-search/enterprise-search:7.15.1
    restart: unless-stopped
    depends_on:
      - "elasticsearch"
    environment:
      - "JAVA_OPTS=-Xms2g -Xmx2g"
      - "ENT_SEARCH_DEFAULT_PASSWORD=changeme"
      - "elasticsearch.username=elastic"
      - "elasticsearch.password=changeme"
      - "elasticsearch.host=http://elasticsearch:9200"
      - "allow_es_settings_modification=true"
      - "secret_management.encryption_keys=[4a2cd3f81d39bf28738c10db0ca782095ffac07279561809eecc722e0c20eb09]"
      - "elasticsearch.startup_retry.interval=15"
      - "crawler.security.ssl.certificate_authorities:none"
      - "crawler.security.dns.allow_loopback_access:false"
    ports:
      - 3002:3002
    networks:
      - elastic

  kibana:
    image: docker.elastic.co/kibana/kibana:7.15.1
    restart: unless-stopped
    depends_on:
      - "elasticsearch"
      - "ent-search"
    ports:
      - 5601:5601
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
      ENTERPRISESEARCH_HOST: http://ent-search:3002
      ELASTICSEARCH_USERNAME: elastic
      ELASTICSEARCH_PASSWORD: changeme
    networks:
      - elastic

While Enterprise Search uses Elasticsearch under the hood, you can't just import indices created via other approaches like that. You will need to ingest that data via the Enterprise Search APIs.

Yes.

You need to use the FSCrawler Workplace search output and reindex everything.

1 Like

OK. During and following the reindexing process, is there a location which may be backed-up to permit an efficient transfer of data location or reconstruction in case of data corruption?

Hi David,

Here's a quick link to the FSCrawler docs on indexing directly into Workplace search.

I recommend checking out the Elasticsearch Snapshot/Restore APIs to back up data when performing unfamiliar actions.

Thankyou. I am grateful for the links.

If snapshots are primarily insurance when performing unfamiliar actions . . . yet it can take days, weeks or even months for fscrawler to index large corpora . . . . then, if data is lost or corrupted, does Elastic.co suggest or offer no solution other than re-indexing?

M. Pilato souhaitera peut-ĂȘtre offrir une autre rĂ©ponse succincte et dĂ©vastatrice. Oui ou non?

Snapshots, ie backups, are the recommended approach.

No, they are not!

Scrilling wrote " I recommend checking out the Elasticsearch Snapshot/Restore APIs to back up data WHEN PERFORMING UNFAMILIAR ACTIONS."

OMG . . I just spotted this at the bottom of the Snapshot/Restore page recommended by Scrilling and Warkolm.

"WARNING - The only reliable and supported way to back up a cluster is by taking a snapshot . You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to Have succeeded having silently lost some of your data."

The entire Snapshot/Restore page is a loud caveat emptor (French AND Latin).

Would snapshots be practicable with a multi-terrabite index?
How long would it take to backup a multi-terrabite index?
How much data would potentially be lost before the next backup could start?
One could use many, concurrent, small indices, to increase the speed of a backup, but this wouldn't solve the total required backup capacity!
Or, would staggard, concurrent backups be the solution?
What prompted the warning to be on the page?

A new index is required for every Lucene upgrade.
As this appears to be a Lucene issue, does it also affects Solr?

So M. Pilato's answer would be, non.

Yes.

It depends on the size of the cluster and what repository you are using.

Lost in what sense?

Snapshots are not done in parallel.

People were copying the underlying Elasticsearch data directories, at a filesystem level, then assuming it'd work.

It's actually not so bad when one is aware of the issue.
Users with small indices can upgrade fairly painlessly, possibly using parallel systems which can be switched.
Users with large indices need to be more circumspect and upgrade only when the benefits of the upgrade outweigh the cost and disruption..

From your business perspective, it may be worth putting this information - together with the considerations and solutions - front and centre, rather than as frightening warnings on a peripheral page.

I will now stop being mean to Mr. Pilato and I'll stick to using English.

FYI. It took eleven weeks (24/7) to generate my 5TB indices. They are now useless. Time to start again.

Hey @neergttocsdivad

You called out my name several times:

M. Pilato souhaitera peut-ĂȘtre offrir une autre rĂ©ponse succincte et dĂ©vastatrice. Oui ou non?

I'm unsure what are you expecting from me. BTW if you want to speak in french with me, you can do it in Discussions en français.
Or if you just want to clarify privately, you can DM me.

So M. Pilato's answer would be, non.

I don't get it.

I will now stop being mean to Mr. Pilato and I'll stick to using English.

Same.

Thanks.