Skip to end of metadata
Go to start of metadata

Introduction

FileCloud supports preview of PDF files directly in the web browser without the need of downloading them onto a computer.  

In some cases, viewing this files can take more time than expected, this depends on how the PDF files are generated:

  1. When scanning a file in a printer/scanner and saving as PDF. Normally these files are images joined together as a PDF.
  2. When the PDF files are created based on JPG or other image format.
  3. When the PDF file is created from a screenshot.

These are just some cases were the PDF files are created based on images from scan, on screen or other similar format. 

PDF Types

In general, PDF can be categorized in to two main types:

  1. Native
  2. Scanned 

The Native PDF files are the ones generated from a computer based source.  When saving a MS Word, Excel or PowerPoint document as a PDF file or when browser paget printed to PDF or saved directly from PDF generation software such as Nitro PDF, Adobe PDF, etc. When saving from these sources, the content information is saved as text, making text based operations such as searching, copying  work directly on the PDF

When PDFs are created from scanning, the problem is that there is no information about the content because the PDF file just serves as a container of images.  This is useful when the objective is to showcase graphics material.

If the PDF that needs to be previewed in FileCloud is a scanned PDF,  then the rendering of this file can take a long time. This is because the file needs to be checked fully for presence of embedded text, to allow search, copy or any other text based operations. This text processing operation will be done on the fly in the client side, which can make the preview loading slow for general use.

Optimization

If the scanned PDF files are created from:

  1. Legal documents
  2. Insurance patient documents
  3. Blueprints and manuals

the best option is to convert these PDF files to native PDF before sharing on the web. When files are converted to native PDF format the preview process is more efficient and opening a file for review will take less time than doing it in a scanned document.

There are several tools in the market to convert scanned PDF files to native PDF files (OCR reading). 

For test purpose we have used ABBY Fine Reader, more information about this software can be here: https://www.abbyy.com/en-eu/finereader/

The source file is the following image:


This file is scanned and saved as a PDF, the PDF created is the following: Scanned_PDF.pdf

If we use this file as it is, the FileCloud Preview will take longer than expected time to render for web view.

Using ABBY Fine Reader sofware we read and conver this PDF file to a Native PDF file:

Screenshot:

The PDF created by this tool is the following: Native_PDF.pdf

The result is that the file now has text that can be searched, copied, and rendered for web view without waiting for it's conversion on the fly.

There are other sofware tools available on the web, the use of ABBY Fine Reader was just for explanation purpose, any other tool can be used to optimize the PDF files for web view.

  • No labels