Process archive files
    • PDF

    Process archive files

    • PDF

    Article summary

    Glasswall Halo offers advanced protection for archive files. This powerful feature utilises the state-of-the-art Embedded Engine to safeguard every single file within an archive. It not only shields the files but also intelligently recompresses the entire archive into a supported format. With this enhanced functionality, your archive files are effectively secured and optimized for seamless use.

    Why CDR?

    Content Disarm and Reconstruction (CDR) is a security measure taken to protect against potential threats and cyber-attacks that may be embedded in files, particularly in documents, images, and other attachments. There are several reasons why you would want to use CDR to sanitise files before allowing other users or information systems to access them.

    Learn more about using CDR to remove threats from files.

    Scanning an archive, which is a file format that contains multiple files and directories, can offer several advantages over scanning individual files one by one. Archives are specifically designed to bundle multiple files together, often with the intention of reducing storage space or organising related content.

    However, it's important to note that not all compressed file types are archives, as some compression formats focus solely on reducing file sizes without necessarily grouping multiple files.

    The following reasons are why you might want to scan an archive rather than a single file:

    1. Efficiency: Scanning a single archive file is more efficient than scanning multiple individual files, especially when dealing with a large number of files. It saves time and reduces the manual effort required to scan each file separately.

    2. Comprehensive Protection: Archives often contain multiple files that are interconnected or dependent on each other. Scanning the entire archive ensures that all files within it are checked for potential threats, ensuring comprehensive protection against malware or other security risks.

    3. Streamlined Management: When dealing with a collection of related files, such as documents, images, or code, keeping them in an archive maintains organisational structure. Scanning the archive helps to keep this structure intact, making it easier to manage and share the entire collection.

    4. Reduced False Positives: Some security software may trigger false positive detections for individual files due to the way they are packed or encrypted. Scanning an archive can help reduce the likelihood of such false positives by analysing the files within their intended context.

    5. Ease of Distribution: When sharing a collection of files, packaging them in an archive can simplify distribution. Scanning the archive before sharing ensures that the recipient gets a clean and safe bundle of files.

    6. Simplified User Experience: From a user's perspective, scanning a single archive is more straightforward and convenient than initiating scans for each individual file. It simplifies the scanning process and reduces user interaction.

    In summary, scanning an archive offers efficiency, comprehensive protection, and streamlined management when dealing with multiple files. It can help users maintain security, organisational structure, and an overall smoother experience when working with collections of files.

    API Documentation

    By utilising Glasswall Halo, you can safely process archives and ensure they are free from hidden threats and malicious content.

    You can use the following API's to create an archive of sanitised files:

    POST api/v3/cdr-file
    POST api/v3/cdr
    

    For more information please refer to our API Documentation.

    API Authentication

    Learn how to authenticate Glasswall Halo

    Glasswall Halo Events

    When you make a request to Glasswall Halo the following events take place:

    1. You send an archive to the Synchronous API for processing.
    2. The archive is decompressed inline with Glasswall's Archive Support rules.
    3. Each file within the archives is stored in Glasswall Halo whilst processing occurs.
    4. The Glasswall Embedded Engine is notified to process all the files found inside the archive.
    5. The Glasswall Embedded Engine retrieves each of the files and begins its CDR process.
    6. Glasswall Halo then re-builds the archive with all the sanitised files in the original structure.
    7. The clean archive is then returned back to the user via the API response.

    Request Construction

    Glasswall Halo provides a wide range of file processing capabilities, accommodating both binary and Base64 encoded archives, and both of these endpoints also support password encrypted zips.

    Please refer to our Supported File Types for a complete list of supported archive file types.

    Additionally, you are able to use our Policy Management API to guide how each file within an archive is processed.

    When utilising Glasswall Halo, you can submit archives in either binary or Base64 format, offering the flexibility to choose the most appropriate file representation for your specific use case and application requirements. If you only require archives with just clean files and no analysis reports, you can make this specific request using the response-content query parameter with the value set to noAnalysisReport. In this instance a request will be made for both the clean file and analysis reports to be produced for the submitted archive.

    This versatile functionality of Glasswall Halo empowers you to tailor the file processing process precisely to your preferences and efficiently achieve your objectives.

    Binary File Processing

    POST {baseUrl}/api/v3/cdr-file
    

    Base64 Encoded File Processing

    Submit the Base64 encoded string in the Request body to the following endpoint:

    POST {baseUrl}/api/v3/cdr
    

    Request body Format

    The body of the request should be in JSON format and include the Base64 field containing the Base64 encoded string of the file, and the fileName field specifying the original filename (including the appropriate file extension).

    {
      "Base64": "string",
      "fileName": "filename.zip",
    }
    

    Note

    • Replace {baseUrl} with the actual base URL of the Glasswall Halo API.
    • Correct auth header should be provided with each request
    • For binary file processing, use a multipart form post, while for Base64 encoded file processing, provide the file content in the JSON Request body with the appropriate filename.
    • If you wish to provide a password for an encrypted zip file you can do this by providing it in the header of the binary request with the key password or it can be supplied in the body of the Base64 request.

    Response Handling

    When an archive is successfully processed by Glasswall Halo, you will receive a 201 HTTP status code, indicating that a new archive has been created, and this file is returned in the response. The format of the response depends on whether you used the binary or Base64 endpoint.

    For the binary endpoint, the archives will be returned with the content type application/octet-stream. You can read all the bytes from the response body, forming the CDR'd archive. The response will also include the content-disposition header, which contains the filename supplied in the multipart form, if available. If the filename was not provided, a generated GUID will be returned as the filename. This feature eliminates the need for you to maintain the filename while processing occurs.

    If you requested a Base64 encoded file to be processed, the response will be in JSON format with a status code of 201. The JSON body will contain the Base64 encoded string representing the clean file produced by Glasswall Halo. The response will look like this:

    {
      "errorReason": null,
      "processingId": "de30c22d-fcef-467c-9ed9-16296318615b",
      "processingStatus": "rebuilt",
      "fileType": "archive",
      "analysisReport": { 
        "content": "UEsDBBQAAAgIACU/EFeWuyNVWwAAAL0AAAARAAAAbWFua...",
        "contentType": "application/octet-stream",
        "contentEncoding": "Base64"
      }
      "rebuiltFile": {
        "content": "/9j/2wBDAAMCAgM...",
        "contentType": "application/octet-stream",
        "contentEncoding": "Base64"
      }
    }
    

    To access the Base64 encoded clean archive, you can retrieve it from the rebuiltFile.content location within the response, if you wish to gain access to the analysis files they can be found at analysisReport.content.

    Now that you have the output, you can leverage it in a variety of ways. If your preference is for clean files only, the resulting archive will mirror the structure of the initial input archive. By downloading and unpacking this archive, you can meticulously review each file. In the event that any individual file encounters processing issues, it will be substituted with a .txt file, detailing the reasons for the processing failure. When an analysis report is requested to be generated the report folder within the returned archive will be a file called manifest.cdr-json, this is a json output detailing the result of each file, this can be used to quickly understand and make decisions for each file within an archive.

    On the other hand, if your aim is to analyse all files within an archive, you'll receive an archive that upholds the original structure. However, in this instance, the previously clean files will be substituted with .json or .xml files. These files embody the analysis reports corresponding to each file's analysis.

    If your objective is to obtain both the clean file and the analysis report from the binary endpoint, the result will be a ZIP archive encompassing two distinct folders: clean and report. This output ZIP archive will bear a globally unique identifier (GUID) as its file name, directly linked to the transaction ID within Glasswall Halo.

    Contained within each of these folders will be an archive mirroring the name of the initial input archive. The clean folder comprises the clean archive, while the report folder holds the archive containing all the analysis reports alongside the manifest.cdr-json file.

    The manifest.cdr-json file has the following strucutre

    {
       "rebuilt":[
          {
             "filename":"/Sample.docx"
          }
       ],
       "failed":[
          {
             "filename":"/Sample.pdf",
             "reason": "This is a reason"
          }
       ],
       "errored":[
          {
             "filename":"/Sample.png",
             "reason": "This is a reason"
          }
          
       ],
       "allowed":[
          {
             "filename":"/Sample.xlsx",
             "reason": "This is a reason"
          }
       ]
    }
    

    Within the 'json' file, every individual section serves to signify the overarching result. Each result category is accompanied by a list of file names, thereby clarifying which specific files within a given archive correspond to that particular outcome. In scenarios where the result isn't categorised as 'rebuilt', a detailed explanation will be furnished to clarify the underlying cause of that particular outcome.

    Summary

    • You have achieved streamlined scanning, saving time when dealing with numerous files.

    • You have achieved thorough checking of all files within an archive, ensuring security against threats.

    • You have achieved the maintenance of file organization while simplifying management and sharing.

    • You have achieved a reduction in false positive detections caused by packed or encrypted files.

    • You have achieved simplified distribution of files while ensuring their safety through packaging in an archive.

    • You have achieved a user-friendly scanning experience by minimizing interactions and complexities.

    By using CDR you have significantly enhanced your organisation's cybersecurity posture, ensuring the safety of sensitive data and mitigating file-based threats effectively.

    Quick Start

    To try Glasswall Halo yourself, please refer to our Quick Start Guide.


    Was this article helpful?