Archive Support
    • PDF

    Archive Support

    • PDF

    Article summary

    Overview

    Glasswall Halo offers advanced protection for archive files. This powerful feature utilises the state-of-the-art Glasswall Engine to safeguard every single file within an archive. It not only shields the files but also intelligently recompresses the entire archive into a supported format. With this enhanced functionality, your archive files are effectively secured and optimized for seamless use.

    Supported Input Types

    Unprotected

    • Zip - (Supports the compressions type of BZip2)
    • 7-Zip
    • GZip
    • Rar
    • Tar

    Protected

    • Zip

    How does it work

    Upon receiving an archive, Halo undertakes a comprehensive processing workflow. Firstly, it decompresses the archive, delving up to five nested levels of archives. At each level, the platform individually processes every non-archive file encountered.

    Each file is subjected to the powerful Glasswall engine for processing. Once all files within the archive have been processed, the platform recompiles the archive, maintaining the original structure. The expected outcomes of this process are as follows:

    1. Protection and sanitisation: The Glasswall engine ensures that each file is thoroughly protected and sanitised, eliminating potential threats and vulnerabilities.

    2. Compliance with policies: Glasswall Halo adheres to the specified policies, ensuring that the resulting archive aligns with the designated security and content management configurations.

    3. Format preservation: The archive is recompiled in a manner that preserves the original structure, maintaining the hierarchical organization and integrity of the files.

    By following this robust processing workflow, Halo ensures the highest level of file protection, policy compliance, and integrity preservation for archives.

    Expected outcomes

    File Types

    While Glasswall Halo is highly capable, there are instances where certain archive types cannot be supported for recompression due to licensing restrictions. In such cases, the platform applies an alternative approach: it recompresses the unsupported archive types into the universally compatible Zip format. Importantly, during this process, the platform ensures that all file types and folder structures within the resulting Zip archive remain completely unchanged.

    By implementing this solution, Glasswall Halo guarantees seamless compatibility and preserves the integrity of the original files and folder organization, even when faced with unsupported archive types. This ensures that data remains accessible and unaltered, despite any limitations posed by licensing constraints.

    Input FileInput File ExampleOutput FileOutput File Example
    Zipfile.zipZipfile.zip
    Tarfile.tarTarfile.tar
    GZipfile.gzipGZipfile.gzip
    7Zipfile.7zipZipfile.7zip.zip
    Rarfile.rarZipfile.rar.zip

    File Contents

    In most scenarios, when processing an archive, Halo replaces a file within the archive with a clean version before returning it via the API. However, there are situations where the platform may encounter difficulties processing a specific file within the archive. In such cases, if some files are successfully processed while others are not, the unprocessable file will be substituted with a .txt file. The contents of this replacement file will offer an explanation as to why the file cannot be processed.

    By employing this approach, Halo ensures that the majority of files within an archive are effectively processed and returned in a clean state. For any files that encounter processing issues, clear and informative explanations are provided, allowing users to understand the reasons behind the un-processable files.

    The following scenarios are to be expected:

    Rebuild

    • Archive is being rebuilt and Entry is Allowed - Uses the Original File
    • Archive is being rebuilt and Entry is Disallowed - Replace with a text file (same name) saying "File Disallowed"
    • Archive is being rebuilt and Entry is Unsupported File type - Replace with a text file (same name) saying "Unsupported File Type"
    • Archive is being rebuilt and Engine Fails While Rebuilding Entry - Replace with a text file (same name) saying "Unable to rebuild file"
    • Archive is being rebuilt and Entry is Successfully Rebuilt - Use Rebuilt File

    Analysis

    • Archive Is being analysed and Entry is Allowed - Replace with a text file (same name) saying "File allowed by policy, no analyse needed"
    • Archive Is being analysed and Entry is Disallowed - Replace with a text file (same name) saying "File Disallowed"
    • Archive Is being analysed and Entry is Unsupported File type - Replace with a text file (same name) saying "Unsupported File Type"
    • Archive Is being analysed and Engine Fails While Analysing Entry - Replace with a text file (same name) saying "Unable to analyse file"
    • Archive Is being analysed and Entry is Successfully Rebuilt - Use Analysis of file

    In all cases of archives when using the V3 endpoint a manifest.json is also outputted which details the processing result of all the files inside the archive and can be used to help understand the archive without having to unpack it.

    Compression Types

    When processing files through Halo, it treats Bzip and Gzip compression types as individual files. As a result, no manifest.json is generated during the processing. However, the processing of these compressed files still follows the rules of archive processing. If a compressed file contains an archive within it, a manifest.json is created and outputted at the highest level of the archive structure.

    The file type response header for a compressed file is set to compressed_file, indicating its compressed nature.

    When an analysis report is generated for a compressed file, it undergoes a renaming process to ensure proper decompression of the inner file. For instance, if the original file is named test.pdf.bz2, the report file is renamed as test.pdf.report.xml.bz2. This naming convention ensures a clear association between the report and the corresponding compressed file, facilitating correct decompression and subsequent analysis.

    By implementing these measures, Halo maintains consistent handling of compressed files, produces accurate reports, and preserves the integrity of the processed files throughout the analysis workflow.

    Policy Config

    Glasswall API endpoints provide wide support for Content Management Flags, enabling granular control over individual files within an archive. This configuration empowers you to define specific actions for each file type within the archive. The default value for each file type in this configuration is set to allow - 0. However, you also have the flexibility to choose alternative supported actions, such as sanitise - 1 or disallow - 2.

    Using the Content Management Flags, you can precisely control the handling of different file types within archives, ensuring that each file receives the appropriate level of management and security based on your desired configuration. This feature enhances the overall control and customization options available to effectively manage files within archives through Glasswall Halo API.

    The following file types are supported under archive configuration:

    "ArchiveConfig": {
        "bmp": 1,
        "doc": 1,
        "docx": 1,
        "emf": 1,
        "gif": 1,
        "jpg": 1,
        "wav": 1,
        "elf": 1,
        "pe": 1,
        "mp4": 1,
        "mpg": 1,
        "pdf": 1,
        "png": 1,
        "ppt": 1,
        "pptx": 1,
        "tif": 1,
        "wmf": 1,
        "xls": 1,
        "xlsx": 1,
        "mp3": 1,
        "rtf": 1,
        "coff": 1,
        "macho": 1,
        "unknown": 1
    },
    

    How is the policy applied within archives

    Archives processed through Glasswall Halo adhere to the provided policy in a hierarchical manner. The fundamental principle guiding the application of policies is to select the most stringent outcome based on the given configuration whilst trying to fulfil the initial request. This means that in some cases the ContentManagementFlags specified in the request take precedence.

    Additionally, when the return-executable-file parameter is set to false, it serves as a blanket restriction for all executables within archives, irrespective of the archive configuration supplied.

    Furthermore, the ArchiveConfig section is also considered, taking into account the values within it. If the values within this section represent the strictest outcome, they will be utilised to determine the resulting outcome of the archives.

    By following this hierarchical approach and incorporating the provided parameters, Halo ensures that policies are enforced effectively, providing maximum security and control over the archive processing.

    Outcome Matrix

    This example is based on a Word document which contains an internal hyperlink:

    CMF = WordContentManagement:InternalHyperlinks value

    AC = ArchiveConfig:doc value

    CMF = 0CMF = 1CMF = 2
    AC = 0File AllowedFile AllowedFile Allowed
    AC = 1File is SanitisedFile is SanitisedFile is Disallowed
    AC = 2File is DisallowedFile is DisallowedFile is Disallowed

    This example is for a pe file with return-executable-file set:

    REF = return-executable-file value

    REF = trueREF = false
    AC = 0File AllowedFile is Disallowed
    AC = 1File is SanitisedFile is Disallowed
    AC = 2File is DisallowedFile is Disallowed

    Limits

    To ensure the system operates smoothly and avoids overloading, specific limits have been set for processing archives. These limitations are as follows:

    • A max file count of 500. This means any archives that contains more the 500 files when unpacked will not process.
    • A max file size of 500mb. Any archives that contain more than 500mb of files inside will fail to process.
    • A max archive count of 50. This tracks any nested archives (archives inside other archives). If more than 50 nested archives are found the overall file will fail.
    • A max nesting count of 5. This refers to how many layers of nesting can occur in a file. This limit does not cause the entire file to fail to process but any files wrapped in more than 5 layers of archives will not be processed and instead a placeholder text file will be placed at the 5th level. This nesting only refers to archives inside archives, folders do not count.

    These limits are all configureable via service configuration.


    Was this article helpful?