What is Glasswall CDR?
  • PDF

What is Glasswall CDR?

  • PDF

What is Glasswall CDR and how does it work?

Glasswall CDR (Content Disarm and Reconstruction) employs our patented 4-step approach to protect organizations and individuals against file-based threats. Unlike most conventional cyber-security solutions, Glasswall CDR does not rely on detection capabilities. Instead, we follow a ‘Zero-Trust’ based approach. Only files that have been Glasswalled have had threats removed. We don’t try to identify malicious code - we simply remove the ability for it to exist in the document. 

All files processed by the Glasswall Embedded Engine are assumed to be malicious. The engine will analyse and rebuild the file back to its known-good manufacturer's specification, removing any potential threats lurking in the file’s structure. The act of correcting deeper-rooted structural content is referred to as remediation, whereas removing content which is configurable through policy management (e.g. hyperlinks in office documents) is referred to as sanitization.



Why is it better than regular anti-virus and sandboxing techniques?

Next generation antivirus software and sandboxes require understanding of a threat in order to defend against it. Glasswall CDR rebuilds every file to the known-good manufacturer's specification, without the need to have specific threat knowledge - eliminating the risk that malware can be hidden within the file’s structure.

Organizations which deploy CDR protection, do not have to be reliant on next generation antivirus or threat intelligence databases, which on average have a protection gap of 18 days for new zero-day threats. Whilst sandbox technology can go beyond relying on hashes and file signatures, and therefore helps to identify novel malware - usability is the first thing that takes a hit for users. Business productivity is sacrificed for security. Sandboxes are really just instrumented virtual machines. For them to be effective relies on two factors, the first is that detection of malicious processes needs to be correct. Secondly, the attacker needs to be impatient, and to always launch a suspicious software process whilst the file is in the Sandbox. That’s quite a big assumption, and is the main reason why sandboxes detection can be avoided. Understandably, business users are generally impatient to receive their mail or business files. The attacker knows this.  

In Q1 2021 alone, the Mcafee Lab Threats Report detailed there were 87.6 million new pieces of Malware discovered. Protecting an organization’s IT infrastructure with an approach that can take on average 18 days to identify new zero-day threats, leaves the door open to unacceptable business risks. 

Glasswall CDR Process - how it works

The Glasswall CDR engine receives an input file from either a CLI or direct API request to the library. The input file then enters our patented 4-step process to rebuild files back to their manufacturer’s known-good specification.

In each of these phases, analysis of the file occurs. The output of each phase then becomes the input for the next, maintaining a level of separation between each process. Once these four process phases have completed, the Glasswall Embedded Engine generates a pristine file, that is free from threats and is accompanied by an analysis report which explains what risks were identified and how these were eliminated.

Phase 1. Inspect 

Inspection is the first processing phase the Glasswall Embedded Engine takes the file through. Its purpose is to determine the document's structure, as well as the size of it’s constituent components. 

It begins by processing the root structure of the document, and much like a document object model (DOM) in HTML, a tree of the file structures is built. Any compressed elements within the document are expanded automatically and added to the tree. 

If the CDR engine discovers another file type within the document’s structure, that file is treated as a separate entity. The newly discovered file is analysed and an additional tree of structures is created. For example, the original file being processed may be an MS Word document, and within that word document there is a JPEG image embedded. The Glasswall Embedded Engine would process the MS Word and JPEG files independently. 

At the end of this process flow, the Glasswall CDR engine is able to generate an analysis report, giving unmatched insight into the structures of a file.

Phase 2. Rebuild

The next process phase iterates over the tree generated in the Inspection cycle. An iterator then runs from the root structures. It then calls a validation method on each structure that it discovers. 

Once the validation method has been called by the iterator, it checks that the data within the specification is consistent with the manufacturer's known good standards. If it does not match the specification, the file will not pass validation. An example of a file specification is ISO32000, which articulates what constitutes a valid PDF document. 

When a file fails to pass validation, the Glasswall CDR Engine seeks to repair the file so that it conforms to the manufacturer’s specification. If it is not possible to do this then, an issue is recorded in the analysis report, as a new file cannot safely be regenerated. 

There are thousands of structures within a document:

The StbTtmbd byte level structure contains a list of embedded fonts within the MS Word ‘.doc’ file.

The SttbW6 structure specifies what the embedded font is, where it is located and how it is stored.

As an example of Glasswall’s validation, if the structures within the file do not adhere to the specification, the default values defined are inserted, validating the embedded font.

Next, remediation fixes structures that deviate from the published specification. During this process, any hidden structures, or those present in the file but are not specified in the manufacturer’s known good specification, are not tagged to be regenerated in the final version of the file. This could include, but is not limited to, caches, part-saves and blocks of unreferenced data. 


Phase 3. Clean 

The next processing phase sanitizes the document using configurable Content Management Policies (CMPs). Inspection has now already uncovered all the structures within the document, so when a structure is encountered, the CMPs apply one of the following actions:

Allow

The structure remains in the processed document and is logged in the Analysis report as an allowed item

Disallow

The structure is logged in the Analysis report as an Issue Item. The document is marked non-conforming and is not regenerated

Sanitize

The structure is surgically removed from the managed document. The removal is logged in the Analysis report. This is achieved by not tagging the structure for regeneration. The Glasswall Embedded Engine applies configurable content management policies to PDF, MS Office files, SVG, WebP and GeoTiff. The sanitization options available are detailed below:

Phase 4. Deliver 

The final process cycle conducts semantic checks on the document. This is to ensure that the visual integrity is maintained to maximize usability of the final, processed document. 

Checks navigate through the document’s structures, cross-referencing the manufacturer's published known-good specification to validate how the documents structures relate to each other. Any references that were broken due to any remediation or sanitization are repaired during this process.


Now the expected data structures have been semantically validated, the stream is written out to the new version of the file. Finally, the file is regenerated on a data-by-data structure basis. The engine walks the tree and writes out the file as it goes. 

The safe, ready to use file is then delivered to the end user. Alongside file delivery, an in depth report is provided to the user detailing what changes have been made to the file to ensure it conforms to the required known-good manufacturer's specifications. If the file cannot be processed, the reason for this is detailed to the user. 

All of this happens within the Glasswall Embedded Engine, typically in less than a second.




What's Next