Configuration Management
    • PDF

    Configuration Management

    • PDF

    Article summary

    Content Management policies are a set of content management switches that can be applied to a particular file type.

    The content management switch is used to identify a file element type and associated action.

    The content management setting specifies the action to be carried out by Glasswall for a particular content management switch. Each content management switch can be set to one of three settings:

    • Allow - The Glasswall Embedded Engine processes any associated file element types and they remain in the regenerated file. The associated structure is logged in the Analysis report as an Allowed Item.

    • Disallow - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine identifies the file as being non-conforming and the file will not be regenerated. The associated structure is logged in the Analysis report as an Issue Item.

    • Sanitise - If any of the associated file element types are identified in the file, the Glasswall Embedded Engine removes them from the regenerated document. The associated structure is logged in the Analysis report as a Sanitisation Item.

    Content Management Reporting

    The following sections show how content that is under the control of a content management switch is presented in the XML Analysis report, depending on the content switch setting.

    Allow

    This is an excerpt from the XML report for a Word (.doc) Binary file, which contains metadata. The content management switch metadata has been set to allow.

        <gw:Camera cameraName="wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>allow</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:AllowedItems itemCount="1">
            <gw:AllowedItem>
                <gw:TechnicalDescription>Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
                <gw:InstanceCount>1</gw:InstanceCount>
                <gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
            </gw:AllowedItem>
        </gw:AllowedItems>
    

    Disallow

    This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to disallow. In Protect Mode, this would cause the file to be marked as non-conforming.

        <gw:Camera cameraName = "wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>disallow</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:IssueItem>
            <gw:TechnicalDescription> Metadata detected in #05SummaryInformation</gw:TechnicalDescription>
            <gw:IssueId>96</gw:IssueId>
            <gw:InstanceCount>1</gw:InstanceCount>
            <gw:RiskLevel>Medium</gw:RiskLevel>
        </gw:IssueItem>
    

    Sanitise

    This is an excerpt from the XML report for a Word (.doc) Binary file which has metadata inside it. The content management switch metadata has been set to sanitise. In Protect Mode, this would result in the metadata being removed from the regenerated file.

        <gw:Camera cameraName = "wordConfig">
        <gw:ContentSwitch>
            <gw:ContentName>metadata</gw:ContentName>
            <gw:ContentValue>sanitise</gw:ContentValue>
        </gw:ContentSwitch>
        ...
        <gw:SanitisationItem>
            <gw:TechnicalDescription>Metadata detected in #05SummaryInformation</<gw:TechnicalDescription>
            <gw:InstanceCount>1</gw:InstanceCount>
            <gw:TotalSizeInBytes>4096</gw:TotalSizeInBytes>
        </gw:SanitisationItem>
    

    Content Management Policies

    These are the available content management policies:

    Content Management SwitchDescription
    pdfConfigContent management switch for PDF file type
    wordConfigContent management switch for Word file type
    pptConfigContent management switch for PowerPoint file type
    xlsConfigContent management switch for Excel file type
    tiffConfigContent management switch for TIFF file type
    svgConfigContent management switch for SVG file type
    webpConfigContent management switch for WebP file type
    jpegConfigContent management switch for JPEG file type
    sysConfigContent management switch to control different Engine settings

    Note: The xlsConfig, pptConfig and wordConfig content management policies cover both Office Open XML and Office Binary file types.

    The available content management switches and applicable file types are shown in the table below:

    Content Management SwitchDescriptionApplicable File Type(s)
    acroformControls Interactive form (AcroForm) contentPDF
    javascriptControls JavaScript code embedded in filesPDF
    external_hyperlinksControls hyperlinks to locations outside the filePDF, Word, Excel, PowerPoint
    embedded_filesControls Embedded file contentPDF, Word, Excel, PowerPoint
    metadataControls file metadataPDF, Word, Excel, PowerPoint, WebP
    actions_allControls PDF Actions such as Rendition, Sound, Movie, Hide, SetOCGState, GoTo3DViewPDF
    internal_hyperlinksControls hyperlinks to locations within the filePDF, Word, Excel, PowerPoint
    value_outside_reasonable_limitsControls Glasswall defined restrictions such as values exceeding a reasonable range e.g. object sizesPDF, Word, Excel, PowerPoint
    digital_signaturesControls digital signature content for signed files or signed objects within filesPDF
    macrosControls VBA Macros which use Visual Basic code to create custom user-generated functionsWord, Excel, PowerPoint
    review_commentsControls document review comments within a fileWord, Excel, PowerPoint
    embedded_imagesControls embedded image content for the Glasswall supported image formatsPDF, Word, Excel, PowerPoint
    dynamic_data_exchangeControls DDE commands and DDE content in documentsWord, Excel
    tracked_changesControls tracked changes in documentsWord, Excel
    hidden_dataControls hidden data in Word (vanish and webHidden attributes), Excel (show and bwMode attributes) and Powerpoint (hidden, width, customWidth, customHeight, ht and hidden attributes)Word, Excel, PowerPoint
    in_text_commentsControls in text comments in documentsWord
    slide_notesControls slide notes in documentsPowerPoint
    connectionsControls connections to external data sources and information for constructs such as OLAP formulas, QueryTables or PivotTablesExcel
    scriptsControls XML Scripts that allow for the creation, storage and manipulation of variables and data during processingSVG
    foreign_objectsControls embedded objects in XML based formats such as SVGSVG
    hyperlinksControls external and internal hyperlinksSVG
    geotiffControls georeferencing information embedded within a TIFF fileTIFF
    jfifControls JFIF marker segments within a JPEG image fileJPEG

    File Types With Content Management Switches (Example)

    <?xml version="1.0" encoding="UTF-8"?>
    <config>
    
        <pdfConfig>    
            <acroform>sanitise</acroform>
            <metadata>sanitise</metadata>
            <javascript>sanitise</javascript>
            <actions_all>sanitise</actions_all>
            <embedded_files>sanitise</embedded_files>
            <embedded_images>sanitise</embedded_images>
            <internal_hyperlinks>sanitise</internal_hyperlinks>
            <external_hyperlinks>sanitise</external_hyperlinks>
    	    <digital_signatures>sanitise</digital_signatures>
    	    <value_outside_reasonable_limits>sanitise</value_outside_reasonable_limits>
        </pdfConfig>
      
        <wordConfig>
            <macros>sanitise</macros>
            <metadata>sanitise</metadata>
            <embedded_files>sanitise</embedded_files>
    	    <embedded_images>sanitise</embedded_images>
            <review_comments>sanitise</review_comments>
            <internal_hyperlinks>sanitise</internal_hyperlinks>
            <external_hyperlinks>sanitise</external_hyperlinks>
            <dynamic_data_exchange>sanitise</dynamic_data_exchange>
            <tracked_changes>sanitise</tracked_changes>
            <hidden_data>sanitise</hidden_data>
        </wordConfig>
      
        <pptConfig>
            <macros>sanitise</macros>
            <metadata>sanitise</metadata>
            <embedded_files>sanitise</embedded_files>
    	    <embedded_images>sanitise</embedded_images>
            <review_comments>sanitise</review_comments>
            <internal_hyperlinks>sanitise</internal_hyperlinks>
            <external_hyperlinks>sanitise</external_hyperlinks>
            <hidden_data>sanitise</hidden_data>
        </pptConfig>
      
        <xlsConfig>
            <macros>sanitise</macros>
            <metadata>sanitise</metadata>
            <embedded_files>sanitise</embedded_files>
    	    <embedded_images>sanitise</embedded_images>
            <review_comments>sanitise</review_comments>
            <internal_hyperlinks>sanitise</internal_hyperlinks>
            <external_hyperlinks>sanitise</external_hyperlinks>
            <dynamic_data_exchange>sanitise</dynamic_data_exchange>
    	    <connections>sanitise</connections>
            <tracked_changes>sanitise</tracked_changes>
            <hidden_data>sanitise</hidden_data>
        </xlsConfig> 
      
        <tiffConfig>
            <geotiff>sanitise</geotiff>
        </tiffConfig>
      
        <webpConfig>
            <metadata>sanitise</metadata>
        </webpConfig>
       
        <svgConfig>
            <scripts>sanitise</scripts>
            <foreign_objects>sanitise</foreign_objects>
            <hyperlinks>sanitise</hyperlinks>
        </svgConfig>
        
        <jpegConfig>
            <jfif>sanitise</jfif>
        </jpegConfig>
    
    </config>
    

    File Types Without Content Management Policies

    There are a number of supported file types that do not currently have content management policies. For these file types, Glasswall will automatically remove content deemed risky, and report the action as a remedy item. This content includes unrecognised embedded data and metadata.

    The common supported file types under this category include:

    • Image types
      • GIF
      • PNG
      • EMF
      • WMF
    • Audio and Video types

    Content Management "sysConfig" Switches

    The sysConfig switches control the overall behaviour of the Embedded Engine in various ways that don't fall under Content Management.

    Switch NameSwitch SettingDefaultDescription
    enable_hash_sha256true/falsetrueCalculates SHA256 hashes of input and output files, adding them into analysis reports. Enabling this will increase processing time.
    enable_text_supportfalse/truefalse(beta feature) Word Search only. Enabling this will allow UTF-8 or ASCII encoded text files to be processed when at least one "Require" action is specified. When disabled, the file will be rejected as an unsupported filetype.
    export_embedded_imagestrue/falsetrueExport embedded images to SISL or XML when set to 'true', or save as raw images when set to 'false'.
    interchange_best_compressionfalse/truefalseCompress the Export archive package at the maximum compression level. Enabling this will increase processing time.
    interchange_prettyfalse/truefalseFormat the intermediate SISL or XML data structure to be more human-readable. Enabling this will slightly increase intermediate file size.
    interchange_typesisl/xmlsislThe intermediate file format for the exported document object model.
    linux_memory_limit0 to 256 (integer only)0Enforces a limit on the memory usage of the process, while processing a session, in GiB. This limit triggers process termination. '0' disables this limit. This setting has no effect on non-Linux platforms. See below for details.
    session_timeout0 to 60 (integer only)60Enforces a time limit on running a session, in minutes. This limit triggers process termination. '0' disables this timeout. See below for details.
    enable_xml_streamingtrue/falsetrueEnable streamed processing during XML export and import, reduces memory usage when compared to DOM style processing.
    enable_sisl_streamingtrue/falsetrueEnable streamed processing during SISL export and import, reduces memory usage when compared to DOM style processing.
    enable_ooxml_tracked_changestrue/falsefalseEnable content management options for content related to "Tracked Changes" in XLSX and DOCX.
    enable_ooxml_hidden_datatrue/falsefalseEnable content management options for content related to "Hidden Data" in DOCX, XLSX and PPTX.
    enable_ooxml_intext_commentstrue/falsefalseEnable content management options for content related to "In Text Comments" in DOCX.
    enable_ooxml_slide_notestrue/falsefalseEnable content management options for content related to "Slide Notes" in PPTX.

    linux_memory_limit - Technical Details

    This option controls a limit on the memory used by the process. This limit is enforced for the duration of a call to GW2RunSession, and does not apply outside of this API call.

    The value is specified in GiB, integers only. The value '0' for this option means 'no limit'. This is the default.

    It applies only to Linux-based platforms. On other platforms, this option has no effect.

    When the limit is reached, the process will print an error message to stderr and terminate. The exit code seen after termination is platform dependent, but will be consistent with application termination by SIGABRT.

    The limit is imposed on the peak Resident Set Size (RSS), which is the amount of physical memory the process consumes.

    session_timeout - Technical Details

    This options controls a limit on the amount of time a call to GW2RunSession can last. It does not apply outside of this API call.

    The value is specified in minutes, integers only. The value '0' for this option means 'no timeout', meaning GW2RunSession will continue for as long as required. The default value is '60', giving a failsafe timeout of one hour.

    When the timeout is reached before GW2RunSession completes, the process will print an error message to stderr and terminate. The exit code seen after termination is platform dependent, but will be consistent with application termination by SIGABRT

    Subsequent calls to GW2RunSession begin their own timer.

    Content Management "sysConfig" Switches (Example)

    <?xml version="1.0" encoding="UTF-8"?>
    <config>
      <sysConfig>
        <interchange_type>sisl</interchange_type>
        <interchange_pretty>false</interchange_pretty>
        <interchange_best_compression>false</interchange_best_compression>
        <export_embedded_images>true</export_embedded_images>
        <enable_hash_sha256>true</enable_hash_sha256>
        <linux_memory_limit>0</linux_memory_limit>
        <session_timeout>60</session_timeout>
        <enable_xml_streaming>false</enable_xml_streaming>
        <enable_sisl_streaming>false</enable_sisl_streaming>
        <enable_ooxml_tracked_changes>false</enable_ooxml_tracked_changes>
        <enable_ooxml_hidden_data>false</enable_ooxml_hidden_data>
      </sysConfig>    
    </config>
    

    Learn more

    About Embedded Engine
    API Functions
    File Type Specifications
    License Management
    OS Features Compatibility
    Policy Management
    Supported File Types
    Supported Operating Systems


    Was this article helpful?