Page tree
Skip to end of metadata
Go to start of metadata

Solr OCR is available for Enterprise users and users with OCR licenses beginning with FileCloud Version 20.3.

When you enable OCR:

  • FileCloud's content search engine searches image files and PDF files for your search string. 
  • FileCloud's content classification engine (CCE) scans image files and PDFs for pattern-matching text.

Install and enable Solr OCR on Windows

Follow these instructions on Windows when performing a fresh installation of FileCloud or when performing an upgrade to the OCR component license.

  1. Upgrade to FileCloud 20.3.
  2. Open cloudconfig.php at  XAMPP DIRECTORY/htdocs/config/cloudconfig.php
  3. Add the following:

    define("TESSERACTOCR_BIN_DIR", "C:\\xampp\\tesseractocr");
    define("TESSERACTOCR_TESSDATA_DIR", "C:\\xampp\\tesseractocr\\tessdata");

    Note:
    TESSERACTOCR_BIN_DIR is the path to the TesseractOCR installation directory which contains the tesseract binary. In windows, this is typically at C:\xampp\tesseractocr\
    TESSERACTOCR_TESSDATA_DIR is the path to the TesseractOCR training data. In windows, this is typically at C:\xampp\tesseractocr\tessdata

  4. In the Admin portal, click Settings in the navigation pane, and then click the Content Search tab.

  5. If you are performing an upgrade, click Reset

     If you are performing a fresh installation, click Configure.
  6. Beside Enable Solr OCR, click the Enable button. 

    A confirmation box warns you that enabling OCR will require you to restart Solr.

  7. Click OK.
    A dialog box confirms that OCR is enabled and prompts you to restart Solr.

  8. Restart the Solr (Content Search) service from the FileCloud control panel.

  9. In the Admin portal, go to Settings, and click the Content Search tab again.
  10. Confirm that:
    • The Enable button is disabled
    • The message below the button says OCR has been successfully setup.
  11. To build or rebuild the search index with OCR for images with text and PDFs, under Managed Storage Index Status,
    • If you are performing a fresh installation, click Index.
    • If you are performing an upgrade, click Reindex.

Install and enable Solr OCR on Linux

Follow these instructions on Linux when performing a fresh installation of FileCloud or when performing an upgrade to the OCR component license.

  1. Upgrade to FileCloud 20.3.
  2. Run filecloudcp -t
  3. In the Admin portal, click Settings in the navigation pane, and then click the Content Search tab.

  4. If you are performing an upgrade, click Reset and delete the current fccore if it exists (run command : rm -rf /opt/solrfcdata/var/solr/data/fccore/).
  5. Inspect the file solrconfig.xml inside /var/www/html/thirdparty/solarium/fcskel/conf and uncomment the line parseContext.xml.
  6. In /var/www/html/thirdparty/solarium, copy the folder fcskel into /opt/solrfcdata/var/solr/data (on the solr server) and rename it fccore.
  7. In the Admin portal, go to Settings, and click the Content Search tab again.
  8. Click Configure.

  9. Confirm that the Enable button is disabled and the message below the button is OCR has been successfully setup.
  10. To build or rebuild the search index with OCR for images and PDFs with text, click Index.

Enable OCR manually

If your system is unable to configure OCR automatically, use the following instructions to enable it manually when performing a fresh installation of FileCloud or when performing an upgrade to the OCR component license.

  1. Upgrade to FileCloud 20.3
  2. Set the Tesseract environment variables:
    • For Windows, add the following to solr.in.cmd:

      SET PATH=%PATH%;C:\xampp\tesseractocr
      SET TESSDATA_PREFIX=C:\xampp\tesseractocr\tessdata
    • For Nix, add the following to to solr.in.sh (or define the environment variables globally)

      PATH="/path/to/tesseractocr:$PATH"
      TESSDATA_PREFIX=/path/to/tesseractocr/tessdata
  3. In the Admin portal, click Settings in the navigation pane, and then click the Content Search tab.
  4. If you are performing an upgrade, click Reset
    If you are performing a fresh installation, clicking Reset is not necessary.
  5. In C:\xampp\htdocs\thirdparty\solarium copy the folder fcskel and rename it fccore.
    Then move it into C:\xampp\solr\server\solr.
  6. Restart the Solr (Content Search) service from the FileCloud control panel.
  7. In the Admin portal, go to Settings, and click the Content Search tab again.
  8. Confirm that the label beneath the Enable Solr OCR button says OCR has been successfully setup.
  9. To build or rebuild the search index with OCR for images with text and PDFs.
    • If you are performing a fresh installation, click Index.
    • If you are performing an upgrade, click Reindex.

  • No labels