Skip to main content
Version: 16.8.2

Content management

Content Management policies are a set of content management switches that can be applied to a particular file type.

The content management switch is used to identify a file element type and associated action.

The content management setting specifies the action to be carried out by Glasswall for a particular content management switch. Each content management switch can be set to one of three settings:

  • Allow - The Glasswall Embedded Engine processes any associated file element types and they remain in the regenerated file. The associated structure is logged in the Analysis report as an Allowed Item.

  • Disallow - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine identifies the file as being non-conforming and the file will not be regenerated. The associated structure is logged in the Analysis report as an Issue Item.

  • Sanitise - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine removes them from the regenerated document. The associated structure is logged in the Analysis report as a Sanitisation Item.

Content management reporting

The following sections show how content that is under the control of a content management switch is presented in the XML Analysis report, depending on the content switch setting.

Allow

This is an excerpt from the XML report for a Word (.doc) Binary file, which contains metadata. The content management switch metadata has been set to allow.

    <gw:Camera cameraName="wordConfig">
<gw:ContentSwitch>
<gw:ContentName>metadata</gw:ContentName>
<gw:ContentValue>allow</gw:ContentValue>
</gw:ContentSwitch>
...
<gw:AllowedItems itemCount="1">
<gw:AllowedItem>
<gw:TechnicalDescription>Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
<gw:InstanceCount>1</gw:InstanceCount>
<gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
</gw:AllowedItem>
</gw:AllowedItems>

Disallow

This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to disallow. In Protect Mode, this would cause the file to be marked as non-conforming.

    <gw:Camera cameraName = "wordConfig">
<gw:ContentSwitch>
<gw:ContentName>metadata</gw:ContentName>
<gw:ContentValue>disallow</gw:ContentValue>
</gw:ContentSwitch>
...
<gw:IssueItem>
<gw:TechnicalDescription> Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
<gw:IssueId>96</gw:IssueId>
<gw:InstanceCount>1</gw:InstanceCount>
<gw:RiskLevel>Medium</gw:RiskLevel>
</gw:IssueItem>

Sanitise

This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to sanitise. In Protect Mode, this would result in the metadata being removed from the regenerated file.

    <gw:Camera cameraName = "wordConfig">
<gw:ContentSwitch>
<gw:ContentName>metadata</gw:ContentName>
<gw:ContentValue>sanitise</gw:ContentValue>
</gw:ContentSwitch>
...
<gw:SanitisationItem>
<gw:TechnicalDescription>Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
<gw:InstanceCount>1</gw:InstanceCount>
<gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
</gw:SanitisationItem>

Content management policies

These are the available content management policies:

Content Management SwitchDescription
pdfConfigContent management switch for PDF file type
wordConfigContent management switch for Word file type
pptConfigContent management switch for PowerPoint file type
xlsConfigContent management switch for Excel file type
tiffConfigContent management switch for TIFF file type
svgConfigContent management switch for SVG file type
webpConfigContent management switch for WebP file type
jpegConfigContent management switch for JPEG file type
sysConfigContent management switch to control different Engine settings

Note: The xlsConfig, pptConfig and wordConfig content management policies cover both Office Open XML and Office Binary file types.

The available content management switches and applicable file types are shown in the table below:

Content Management SwitchDescription
acroformControls Interactive form (AcroForm) content
javascriptControls JavaScript code embedded in files
external_hyperlinksControls hyperlinks to locations outside the file
embedded_filesControls Embedded file content
metadataControls file metadata
actions_allControls PDF Actions such as Rendition, Sound, Movie, Hide, SetOCGState, GoTo3DView
internal_hyperlinksControls hyperlinks to locations within the file
value_outside_reasonable_limitsControls Glasswall defined restrictions such as values exceeding a reasonable range e.g. object sizes
digital_signaturesControls digital signature content for signed files or signed objects within files. NOTE: the 'allow' setting cannot be used for the digital_signatures content management switch.
macrosControls VBA Macros which use Visual Basic code to create custom user-generated functions
review_commentsControls document review comments within a file
embedded_imagesControls embedded image content for the Glasswall supported image formats
dynamic_data_exchangeControls DDE commands and DDE content in documents
tracked_changesControls tracked changes in documents
hidden_dataControls hidden data in documents
in_text_commentsControls in text comments in documents
slide_notesControls slide notes in documents
connectionsControls connections to external data sources and information for constructs such as OLAP formulas, QueryTables or PivotTables
scriptsControls XML Scripts that allow for the creation, storage and manipulation of variables and data during processing
foreign_objectsControls embedded objects in XML based formats such as SVG
hyperlinksControls external and internal hyperlinks
geotiffControls georeferencing information embedded within a TIFF file
jfifControls JFIF marker segments within a JPEG image file
undefined_typeControls TIFF IFD segments of undefined type

The switches currently available for each format are depicted in the table below:

SwitchPDFDOCDOCXPPTPPTXXLSXLSXGIFJPEGSVGWEBPTIFF
acroform
actions_all
connections
digital_signatures *
dynamic_data_exchange
embedded_files
embedded_images
external_hyperlinks
foreign_objects
geotiff
hidden_data
hyperlinks
internal_hyperlinks
in_text_comment
javascript
jfif
macros
metadata
retain_exported_streams *
review_comments
slide_notes *
scripts
tracked_changes
value_outside_reasonable_limits
undefined_type

[ *]: Content management switch available in Editor's "enablerebuild" (default) mode or Rebuild only [ †]: Content management switch available in Editor's "editoronly" mode, which can only be used with the Export/Import feature

All content types not represented by a content management type for a specific file format will be automatically remediated by the Glasswall engine if identified as malicious.

Embedded files

The "Embedded Files" content management type applies to non-image file formats which are located within a distinct container file. For MS-Office formats, the policy for embedded files is applied differently depending on whether the file considered is supported and accessible to the engine:

Action applied to embedded file according to content management policy for Microsoft Office files:

AllowSanitiseDisallow
SupportedTreated as standalone file. If file is non-conforming, containing file is rejected and reason for non-conformance reported as an Issue Item.Treated as standalone file. If file is non-conforming, containing file is rejected and reason for non-conformance reported as an Issue Item.Containing file is rejected, with the embedded file described in an Issue Item.
UnsupportedRegenerated without alteration and reported as an Allowed Item.Removed from containing file, alongside all references to it, and reported as a Sanitisation Item.Containing file is rejected, with the embedded file described in an Issue Item.

The table below outlines which embedded file formats are supported (✓) within each container file type, and which are not (✗).

Embedded File Format ↓ / Container Format →DOCX/XLSX/PPTXDOC/XLS/PPTPDF
Office 2007
Office 2003
Office 1997
PDF
MP3n/a
MP4n/a
MPEGn/a
WAV
Formats unsupported by Glasswall

[†]: Disallowed by container format

[‡]: Not removed by Embedded Files switch but may be removed by All Actions switch. Embedded file is regenerated without being processed.

⚠️ Note: To preserve visual integrity between the original and sanitised versions of files, associated visual elements (such as thumbnails and blip references) of unsupported embedded files are not removed during sanitisation. This ensures that post-processed files remain visually consistent with their original versions.

Embedding depth support

The Embedded Engine supports up to nine levels of nested embedded content within OfficeXML files. If any embedded files are found beyond this depth, the container file will be rejected, and an Issue Item will be raised indicating that the maximum recursion limit has been exceeded. This limit applies only to the depth of nesting, and multiple embedded files at the same level do not count against it.

Embedded images

For image file formats, the "Embedded Images" content management switch should be used. This has the following behaviour depending on switch setting:

Action applied to embedded image according to content management policy:

AllowSanitiseDisallow
SupportedTreated as standalone file. If file is non-conforming, containing file is rejected and reason for non-conformance reported as an Issue Item.Treated as standalone file. If file is non-conforming, containing file is rejected and reason for non-conformance reported as an Issue Item.Containing file is rejected, with the embedded image described in an Issue Item
Unsupported*Regenerated without alteration and reported as an Allowed ItemRemoved from containing file, alongside all references to it, and reported as a Sanitisation ItemContaining file is rejected, with the embedded image described in an Issue Item

[ * ] : Unsupported embedded images may be instead handled by the "embedded_files" switch if the engine does not recognise the filetype

The table below shows which image formats we attempt to regenerate () when "Embedded Images" is set to sanitise versus those which are removed ():

Embedded Image FormatDOCX/XLSX/PPTXDOC/XLS/PPTPDF
BMP, JPEG, GIF, PNG, EMF, SVG, TIFF
WMF, EMF
WebP
Formats unsupported by Glasswall

[⸸]: Will be converted to a different format by container file

Please note that when the "Embedded Images" is set to "Disallow", any images being encountered will result in the rejection of the containing file. This includes thumbnails of the containing or embedded documents and so may supersede the "Embedded File" content management switch.

Macros

The macros content switch for MS Office files applies to both Microsoft Visual Basic for Applications (VBA) and Excel 4.0 macros.

Microsoft Visual Basic for Applications

VBA macros are written in the Visual Basic programming language and can be included in any MS Office file format. The handling of VBA macros can be configured as follows:

  • Sanitise - VBA macros are removed from files.
  • Disallow - VBA macros are forbidden. If one is found, the containing file is rejected.
  • Allow - VBA macros are processed and regenerated as part of the containing file providing they conform to specification.

Export mode behaviour

In Export mode, VBA Project Binaries count toward the recursion limit. This means the maximum nesting depth is reduced to eight if the deepest embedded file contains a VBA macro.

Excel 4.0 macros

Excel 4.0 macros are a legacy feature included in XLSX and XLS files. XLSX files containing Excel 4.0 macros will be saved using the ".xlsm" file extension and will produce an error if this extension is modified. The handling of Excel 4.0 macros can be configured as follows:

  • Sanitise - In XLS files, the file will be blocked and Excel 4.0 Macro found: Not supported reported as an issue item. In XLSX/XLSM files, sheets containing macros will be removed from the document and reported as a sanitisation item. If this causes the file to be malformed (i.e. reducing the number of visible sheets to zero), the file will be rejected and an appropriate issue item reported.
  • Disallow - Excel 4.0 macros are forbidden. If one is found, the containing file is rejected.
  • Allow - In XLS files, the file will be blocked and Excel 4.0 Macro found: Not supported reported as an issue item. In XLSX/XLSM files, the file will be regenerated with macros intact.

Metadata

In OOXML, metadata refers to information that describes the content, structure, and properties of a document but is not part of the document's main content. Metadata in OOXML documents is primarily stored in XML files located within the docProps directory:

  1. core.xml: Contains core properties based on the Dublin Core Metadata Element Set.
  2. app.xml: Contains extended properties specific to Microsoft Office applications.
  3. custom.xml: Contains custom properties.

The handling of OOXML metadata can be configured as follows:

  • Sanitise - The file is regenerated with metadata removed (see below for all the properties currently sanitised)
  • Disallow - Metadata is forbidden. If any metadata (properties listed below) is found, the containing file is rejected.
  • Allow - The file is processed, and the metadata is regenerated.

As part of the 'metadata' content management switch, we currently sanitise the following in:

  • core.xml: title, subject, creator, keywords, description, lastModifiedBy, revision, lastPrinted, created, modified, category, contentStatus, language, and version.
  • app.xml: manager, company, and hyperlinkBase
  • custom.xml: any custom properties added to the OOXML document.

OfficeXML (DOCX, XLSX, PPTX) exclusive switches

Hidden data

Office file formats offer multiple different ways of legitimately "hiding" text or data, including whole Excel sheets, PowerPoint slides or lines of text in a Word document. The Glasswall engine deals with hidden data in the following ways, depending on the content management switch setting:

  • Sanitise - The file is regenerated with all hidden data "unhidden", so it is completely visible to the user.
  • Disallow - Hidden data is forbidden. If any hidden data is found, the containing file is rejected.
  • Allow - Any hidden data is regenerated and remains hidden.

Note: For the purposes of this content management setting, “Hidden Data” does not refer to the varied ways to obfuscate or bury data in Office 2007 files. Rather, it is specific to the methods of hiding data that are readily available in the Office 2007 GUI. Obfuscated or concealed data is managed by the policy setting corresponding to the method used, e.g., metadata will remove data concealed within free-text fields contained within the document's metadata.

Tracked changes

The tracked_changes content management switch refers to content added by the "Track Changes" functionality in DOCX and XLSX files, also known as "revisions". These can contain historic data related to previous versions of the document, including names of contributors and records of content that has since been removed or obfuscated. The handling of tracked changes can be configured as follows:

  • Sanitise - All historic data is removed and "Track Changes" disabled. The regenerated document will be equivalent to the final state of the original document.
  • Disallow - Tracked changes are forbidden. If there is any evidence of previous revisions or tracked changes still present in the file, the file will be rejected.
  • Allow - The file is regenerated with all historic changes, revisions and tracked changes intact.

Slide notes

The slide_notes content management switch refers to content added by the "Notes" functionality in PPTX files, also known as "slide notes" (and/or "speaker notes"). The Glasswall engine deals with these slide notes in the following ways, depending on the configuration of the content management switch setting:

  • Sanitise - The file is regenerated with all slide notes removed.
  • Disallow - Slide notes are forbidden. If any slide notes are found, the containing file is rejected.
  • Allow - Any slide notes are regenerated and remain in the file.

In-Text comments

The in_text_comment switch refers to content added by the "In-Text Comments" functionality in DOCX files. The handling of the switch can be configured as follows:

  • Sanitise - In-Text Comment is removed alongside the corresponding document metadata found in core.xml.
  • Disallow - In-Text Comment is forbidden. Any DOCX containing an in-text comment will block the file from being regenerated.
  • Allow - The file is regenerated with the In-Text Comment present in the regenerated DOCX file.

Note: When in_text_comment sanitise is set to allow and metadata switch is set to sanitise then the regenerated file will have the in-text comment present without any data since the metadata switch sanitises the corresponding description from the core.xml file.

PDF exclusive switches

Digital signatures

Overview PDF files may contain Digital Signatures and AcroForms, certain types of AcroForms can contain digital signatures. While digital signatures are used to verify the authenticity and integrity of a document, AcroForms provide the structural foundation for interactive form fields. When a digital signature is present in the PDF, then the AcroForm has the visible representation of the signature itself.

When processing PDF files that include digital signatures, the Glasswall CDR engine applies a sanitisation process designed to preserve visual integrity while removing active and/or potentially risky content.

How the CDR Engine handles Digital Signatures To ensure both document safety and consistency, the Glasswall CDR engine performs the following actions during sanitisation:

  • Removes the cryptographic signature data, including any embedded certificates, validation logic, or scripts.
  • Strips signature-related metadata and interactive behavior to eliminate execution pathways or potential exploits.
  • Preserves the visual appearance of the signature widget, such as the signature image, signer name, and date/time text. This is achieved by flattening it into the static content layer of the PDF.
AcroFormDigital SignatureExpected AcroForm behaviorExpected Digital Signature behaviorBehavior of AcroForm section containing Digital SignatureIs File Regenerated?
AllowAllowRegenerated without sanitisationRegenerated without sanitisationEntire section (including interactive form and digital signature) is preserved as-isYes
SanitiseAllowSanitised (removed or flattened)Regenerated without sanitisationVisual digital signature is preserved; AcroForm field it resides in is sanitised or removedYes
AllowSanitiseRegenerated without sanitisationSanitised (cryptographic elements removed)Visual part of digital signature is preserved as part of the AcroForm; signature becomes non-functionalYes
SanitiseSanitiseSanitisedSanitisedEntire digital signature section, including AcroForm fields, is removed or flattened visuallyYes
Disallow*Not applicableNot applicableFile is not regenerated due to disallowed AcroForm presenceNo
*DisallowNot applicableNot applicableFile is not regenerated due to disallowed Digital Signature presenceNo

Auditability and chain of custody

To support traceability and accountability in secure environments, the Glasswall CDR engine records the cryptographic hashes of both the input and output files. This enables a system integrator:

  • To verify file provenance through hash comparison.
  • To provide assurance that, where a digital signature is no longer valid, the chain of custody is maintained and can be proven.