Overview
    • PDF

    Overview

    • PDF

    Article summary

    Glasswall Conform is a command-line tool designed to reconstruct malformed or corrupt PDF files that cannot be processed by the Glasswall Embedded Engine. This executable utility extracts visual content such as text, graphics, and images from input PDFs, generating a newly reconstructed document that adheres to PDF standards.

    The tool is particularly useful for handling non-standard or problematic PDF files that do not conform to standard specifications. By restoring structural integrity, Glasswall Conform makes these files suitable for further processing by the Glasswall Embedded Engine for Content Disarm and Reconstruction (CDR).

    The reconstruction process allows for some loss of visual fidelity to achieve a conforming, processable file.

    Private Preview Status

    Glasswall Conform is currently in a Private Preview phase. While it effectively addresses a wide range of common PDF issues, it may not fully reconstruct highly complex, severely malformed, or non-standard PDFs. In some cases, the tool may not be able to process certain documents.

    The Private Preview version provides a foundational solution for handling problematic PDFs, but please note that reconstruction may not always be complete or entirely accurate. For more information on current features, constraints, and known limitations, please refer to Features, Constraints, and Limitations.

    User Guide

    For instructions on installation, configuration, and usage, including advanced options, please refer to the Glasswall Conform User Guide. The User Guide includes examples of command-line usage and describes the various processing modes and arguments available.

    Platform Support

    Glasswall Conform is currently supported on Linux, amd64 only and is distributed as an executable, making it suitable for deployment in restricted or isolated environments.

    Release Notes

    Current Release: 0.8.12

    Features

    Glasswall Conform is a command-line tool designed for pre-processing PDF documents. It extracts and reconstructs visual content to ensure documents meet PDF standards, preparing them for further processing by the Glasswall Embedded Engine, which provides comprehensive Content Disarm and Reconstruction (CDR) protection.

    Key Features:

    • Text, Graphic, and Image Extraction: Extracts and reconstructs text, graphics, and images from PDFs, producing a clean, standards-compliant output document.
    • Handling Rate Threshold: Allows setting a minimum handling rate for graphics, images, or text. Files that fail to meet this threshold are classified as failures and will not be saved.
    • Custom Watermarking: Supports adding custom watermark text on each page of the reconstructed PDF, enabling personalised branding or messaging.
    • Character Identifier (CID) and Glyph Suppression: Suppresses unsupported glyphs and character identifiers (CIDs), replacing them with the default black square character (■).
    • Font Replacement: Converts custom embedded fonts to known-good Microsoft fonts or defaults to Cambria Math when necessary. This process aims to provide the best possible text display, even when custom fonts are not supported.
    • Standards Compliance: Produces a reconstructed PDF that adheres to PDF standards, allowing for subsequent CDR processing by the Glasswall Embedded Engine for full Content Disarm and Reconstruction (CDR) protection.
    • Fast Mode: A newly enabled default mode that processes files more quickly while delivering enhanced visual fidelity compared to Standard Mode. Standard Mode, which offers stricter handling of PDFs, replaces embedded fonts but may result in longer processing times. It is recommended for scenarios where there is zero tolerance for embedded fonts, due to potential risks associated with third-party font libraries.

    Constraints and Limitations

    While Glasswall Conform is a powerful tool, certain constraints and limitations should be considered:

    • Image Handling: Some image colour spaces are unsupported and may be ignored. Additionally, image processing may convert compressed images to a lossless format, which can increase file size.
    • Font Handling: Glasswall Conform supports Base 14 and many Microsoft fonts, but unsupported custom fonts are replaced to mitigate potential risks.
    • PDF Structure: PDFs missing essential structural elements (e.g., root catalog, cross-reference tables) may not be recoverable.
    • Memory Usage: PDFs with many images may consume significant memory. While the tool has been tested with files up to 50 MB, larger files may experience performance issues.
    • Color Spaces: The CalRGB colour space is not supported.
    • Graphics Handling: Support for complex graphics, such as shapes, charts, and graphs, is limited. This version prioritises text integrity.
    • Document Recovery: Severely corrupted PDFs or those with missing structural elements may be unrecoverable.
    • Platform Support: Glasswall Conform is currently only available for Linux amd64 architectures.

    Licensing

    Glasswall Conform includes PyMuPDF software which is available under both open-source AGPL and commercial license agreements via Artifex. Glasswall holds a commercial distribution license agreement for the context of Glasswall Conform.


    Was this article helpful?