Overview
    • PDF

    Overview

    • PDF

    Article summary

    Content Management policies are a set of content management switches that can be applied to a particular file type.

    The content management switch is used to identify a file element type and associated action.

    The content management setting specifies the action to be carried out by Glasswall for a particular content management switch. Each content management switch can be set to one of three settings:

    • Allow - The Glasswall Embedded Engine processes any associated file element types and they remain in the regenerated file. The associated structure is logged in the Analysis report as an Allowed Item.

    • Disallow - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine identifies the file as being non-conforming and the file will not be regenerated. The associated structure is logged in the Analysis report as an Issue Item.

    • Sanitise - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine removes them from the regenerated document. The associated structure is logged in the Analysis report as a Sanitisation Item.

    Content Management Reporting

    The following sections show how content that is under the control of a content management switch is presented in the XML Analysis report, depending on the content switch setting.

    Allow

    This is an excerpt from the XML report for a Word (.doc) Binary file, which contains metadata. The content management switch metadata has been set to allow.

        <gw:Camera cameraName="wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>allow</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:AllowedItems itemCount="1">
            <gw:AllowedItem>
                <gw:TechnicalDescription>Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
                <gw:InstanceCount>1</gw:InstanceCount>
                <gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
            </gw:AllowedItem>
        </gw:AllowedItems>
    

    Disallow

    This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to disallow. In Protect Mode, this would cause the file to be marked as non-conforming.

        <gw:Camera cameraName = "wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>disallow</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:IssueItem>
            <gw:TechnicalDescription> Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
            <gw:IssueId>96</gw:IssueId>
            <gw:InstanceCount>1</gw:InstanceCount>
            <gw:RiskLevel>Medium</gw:RiskLevel>
        </gw:IssueItem>
    

    Sanitise

    This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to sanitise. In Protect Mode, this would result in the metadata being removed from the regenerated file.

        <gw:Camera cameraName = "wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>sanitise</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:SanitisationItem>
            <gw:TechnicalDescription>Metadata detected in #05SummaryInformation</<gw:TechnicalDescription>
            <gw:InstanceCount>1</gw:InstanceCount>
            <gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
        </gw:SanitisationItem>
    

    Content Management Policies

    These are the available content management policies:

    Content Management SwitchDescription
    pdfConfigContent management switch for PDF file type
    wordConfigContent management switch for Word file type
    pptConfigContent management switch for PowerPoint file type
    xlsConfigContent management switch for Excel file type
    tiffConfigContent management switch for TIFF file type
    svgConfigContent management switch for SVG file type
    webpConfigContent management switch for WebP file type
    jpegConfigContent management switch for JPEG file type
    sysConfigContent management switch to control different Engine settings

    Note: The xlsConfig, pptConfig and wordConfig content management policies cover both Office Open XML and Office Binary file types.

    The available content management switches and applicable file types are shown in the table below:

    Content Management SwitchDescription
    acroformControls Interactive form (AcroForm) content
    javascriptControls JavaScript code embedded in files
    external_hyperlinksControls hyperlinks to locations outside the file
    embedded_filesControls Embedded file content
    metadataControls file metadata
    actions_allControls PDF Actions such as Rendition, Sound, Movie, Hide, SetOCGState, GoTo3DView
    internal_hyperlinksControls hyperlinks to locations within the file
    value_outside_reasonable_limitsControls Glasswall defined restrictions such as values exceeding a reasonable range e.g. object sizes
    digital_signaturesControls digital signature content for signed files or signed objects within files. NOTE: the 'allow' setting cannot be used for the digital_signatures content management switch.
    macrosControls VBA Macros which use Visual Basic code to create custom user-generated functions
    review_commentsControls document review comments within a file
    embedded_imagesControls embedded image content for the Glasswall supported image formats
    dynamic_data_exchangeControls DDE commands and DDE content in documents
    tracked_changesControls tracked changes in documents
    hidden_dataControls hidden data in documents
    in_text_commentsControls in text comments in documents
    slide_notesControls slide notes in documents
    connectionsControls connections to external data sources and information for constructs such as OLAP formulas, QueryTables or PivotTables
    scriptsControls XML Scripts that allow for the creation, storage and manipulation of variables and data during processing
    foreign_objectsControls embedded objects in XML based formats such as SVG
    hyperlinksControls external and internal hyperlinks
    geotiffControls georeferencing information embedded within a TIFF file
    jfifControls JFIF marker segments within a JPEG image file

    The switches currently available for each format are depicted in the table below:

    SwitchPDFDOCDOCXPPTPPTXXLSXLSXGIFJPEGSVGWEBPTIFF
    acroform
    actions_all
    connections
    digital_signatures *
    dynamic_data_exchange
    embedded_files
    embedded_images
    external_hyperlinks
    foreign_objects
    geotiff
    hidden_data
    hyperlinks
    internal_hyperlinks
    in_text_comment
    javascript
    jfif
    macros
    metadata
    retain_exported_streams *
    review_comments
    slide_notes *
    scripts
    tracked_changes
    value_outside_reasonable_limits

    [ *]: Content management switch available in Editor's "enablerebuild" (default) mode or Rebuild only
    [ †]: Content management switch available in Editor's "editoronly" mode, which can only be used with the Export/Import feature

    All content types not represented by a content management type for a specific file format will be automatically remediated by the Glasswall engine if identified as malicious.

    Embedded Files

    The "Embedded Files" content management type applies only to non-image file formats which are either unsupported by the Glasswall engine or obfuscated by the containing file. For MS Office formats, embedded files in supported formats are processed as a standalone file and if the embedded supported file is conforming it will be regenerated regardless of content management settings, otherwise the containing file will be rejected.

    The Embedded Engine currently supports the processing of OOXML files up to a depth of nine embedded files. This means that the Engine is capable of traversing and processing the content of files nested within files, down to the ninth level of embedding. OOXML files containing more more than ten levels of embedded files are not supported and will not be processed.

    • Sanitise - For MS Office formats, embedded MS Office files are processed as standalone files. If the embedded file is conforming, the embedded file will be regenerated; otherwise, both the containing and embedded file will be rejected. For all other container or embedded formats, embedded files are removed without being processed.
    • Disallow - Embedded files are forbidden. If one is found, both the embedded and the containing file are rejected.
    • Allow - For MS Office formats, embedded MS Office files are processed as standalone files. If one is non-conforming, both the embedded and the containing file are rejected. For all other container or embedded formats, embedded files are regenerated without being processed.

    The table below shows which embedded file formats we attempt to regenerate () when "Embedded Files" is set to "Sanitise" versus those which are removed ():

    Embedded File FormatDOCX/XLSX/PPTXDOC/XLS/PPTPDF
    Office 2003
    Office 1997
    PDF
    MP3n/a
    MP4n/a
    MPEGn/a
    WAV
    Formats unsupported by Glasswall

    [†]: Disallowed by container format

    [‡]: Not removed by Embedded Files switch, but may be removed by All Actions switch. Embedded file is regenerated without being processed.

    Embedded Images

    For image file formats, the "Embedded Images" content management switch should be used. This has the following behaviour depending on switch setting:

    • Sanitise - For MS Office, embedded images in supported formats are processed as standalone files. If the embedded image is conforming, the embedded file will be regenerated; otherwise, both the containing and embedded file will be rejected. Unsupported image formats are removed. In PDFs, embedded images are not processed and will always be regenerated if entry is structurally correct.
    • Disallow - Embedded images are forbidden. If one is found, the containing file is rejected.
    • Allow - Embedded images are not processed and are always regenerated as long as they are a supported file format.

    The table below shows which image formats we attempt to regenerate () when "Embedded Images" is set to "Sanitise" versus those which are removed ():

    Embedded Image FormatDOCX/XLSX/PPTXDOC/XLS/PPTPDF
    BMP, JPEG, GIF, PNG, EMF, SVG, TIFF
    WMF, EMF
    WebP
    Formats unsupported by Glasswall

    [⸸]: Will be converted to a different format by container file

    Please note that when the "Embedded Images" is set to "Disallow", any images being encountered will result in the rejection of the containing file. This includes thumbnails of the containing or embedded documents and so may supersede the "Embedded File" content management switch.

    Digital Signatures

    Allow is not a valid content management setting for the 'Digital Signatures' content in PDF documents. Due to the cryptographic nature of digital signatures, they will always become invalid when a PDF is reconstructed by the engine.

    Macros

    The macros content switch for MS Office files applies to both Microsoft Visual Basic for Applications (VBA) and Excel 4.0 macros.

    Microsoft Visual Basic for Applications

    VBA macros are written in the Visual Basic programming language and can be included in any MS Office file format. The handling of VBA macros can be configured as follows:

    • Sanitise - VBA macros are removed from files.
    • Disallow - VBA macros are forbidden. If one is found, the containing file is rejected.
    • Allow - VBA macros are processed and regenerated as part of the containing file providing they conform to specification.

    Excel 4.0 Macros

    Excel 4.0 macros are a legacy feature included in XLSX and XLS files. XLSX files containing Excel 4.0 macros will be saved using the ".xlsm" file extension and will produce an error if this extension is modified. The handling of Excel 4.0 macros can be configured as follows:

    • Sanitise - In XLS files, the file will be blocked and Excel 4.0 Macro found: Not supported reported as an issue item. In XLSX/XLSM files, sheets containing macros will be removed from the document and reported as a sanitisation item. If this causes the file to be malformed (i.e. reducing the number of visible sheets to zero), the file will be rejected and an appropriate issue item reported.
    • Disallow - Excel 4.0 macros are forbidden. If one is found, the containing file is rejected.
    • Allow - In XLS files, the file will be blocked and Excel 4.0 Macro found: Not supported reported as an issue item. In XLSX/XLSM files, the file will be regenerated with macros intact.

    Metadata

    In OOXML, metadata refers to information that describes the content, structure, and properties of a document but is not part of the document's main content. Metadata in OOXML documents is primarily stored in XML files located within the docProps directory:

    1. core.xml: Contains core properties based on the Dublin Core Metadata Element Set.
    2. app.xml: Contains extended properties specific to Microsoft Office applications.
    3. custom.xml: Contains custom properties.

    The handling of OOXML metadata can be configured as follows:

    • Sanitise - The file is regenerated with metadata removed (see below for all the properties currently sanitised)
    • Disallow - Metadata is forbidden. If any metadata (properties listed below) is found, the containing file is rejected.
    • Allow - The file is processed and the metadata is regenerated.

    As part of the 'metadata' content management switch, we currently sanitise the following in:

    • core.xml: title, subject, creator, keywords, description, lastModifiedBy, revision, lastPrinted, created, modified, category, contentStatus, language, and version.
    • app.xml: manager, company, and hyperlinkBase
    • custom.xml: any custom properties added to the OOXML document.

    OfficeXML (DOCX, XLSX, PPTX) Exclusive Switches

    Hidden Data

    Office file formats offer multiple different way of legitimately "hiding" text or data, including whole Excel sheets, PowerPoint slides or lines of text in a Word document. The Glasswall engine deals with hidden data in the following ways, depending on the content management switch setting:

    • Sanitise - The file is regenerated with all hidden data "unhidden" so it is completely visible to the user.
    • Disallow - Hidden data is forbidden. If any hidden data is found, the containing file is rejected.
    • Allow - Any hidden data is regenerated and remains hidden.

    Note: For the purposes of this content management setting, “Hidden Data” does not refer to the varied ways to obfuscate or bury data in Office 2007 files. Rather, it is specific to the methods of hiding data that are readily available in the Office 2007 GUI. Ofbuscated or concealed data is managed by the policy setting corresponding to the method used, e.g., metadata will remove data concealed within free-text fields contained within the document's metadata.

    Tracked Changes

    The tracked_changes content management switch refers to content added by the "Track Changes" functionality in DOCX and XLSX files, also known as "revisions". These can contain historic data related to previous versions of the document, including names of contributors and records of content that has since been removed or obfuscated. The handling of tracked changes can be configured as follows:

    • Sanitise - All historic data is removed and "Track Changes" disabled. The regenerated document will be equivalent to the final state of the original document.
    • Disallow - Tracked changes are forbidden. If there is any evidence of previous revisions or tracked changes still present in the file, the file will be rejected.
    • Allow - The file is regenerated with all historic changes, revisions and tracked changes intact.

    Slide Notes

    The slide_notes content management switch refers to content added by the "Notes" functionality in PPTX files, also known as "slide notes" (and/or "speaker notes"). The Glasswall engine deals with these slide notes in the following ways, depending on the configuration of the content management switch setting:

    • Sanitise - The file is regenerated with all slide notes removed.
    • Disallow - Slide notes are forbidden. If any slide notes are found, the containing file is rejected.
    • Allow - Any slide notes are regenerated and remain in the file.

    In-Text Comments

    The in_text_comment switch refers to content added by the "In-Text Comments" functionality in DOCX files. The handling of the switch can be configured as follows:

    • Sanitise - In-Text Comment is removed alongside the corresponding document metadata found in core.xml.
    • Disallow - In-Text Comment is forbidden. Any DOCX containing an in-text comment will block the file from being regenerated.
    • Allow - The file is regenerated with the In-Text Comment present in the regenerated DOCX file.

    Note: When intextcomment sanitise is set to allow and metadata switch is set to sanitise then the regenerated file will have the in-text comment present without any data since the metadata switch sanitises the corresponding description from the core.xml file.


    Was this article helpful?