Protect & Analyse a File
    • PDF

    Protect & Analyse a File

    • PDF

    Article summary

    In Protect Mode, content management policies allow control of various file content types such as executable code, interactive form content and a number of actions (e.g., external links or the execution of JavaScript). These file elements are known to be common attack vectors when they are encountered within a file. The content management policy will define how the Glasswall Embedded Engine should process these structures. In Analysis Mode, these are reported as SanitisationItems. Content management policy differs across supported file types.

    Automatic corrections back to the file specification are also performed upon file regeneration. The purpose of this is to enable the Glasswall Embedded Engine to remove threats that are hidden within the file structure, as well preventing the possibility of activating exploits via the misuse of structural components in the file. In Analysis Mode, these are reported as RemedyItems.

    Files can be protected individually from a file path or in memory using the protect_file method, or all files from a directory can be protected using the protect_directory method. Alternatively, protected files and analysis reports can be generated within a single session using the protect_and_analyse_file or protect_and_analyse_directory methods.

    Examples

    Protect

    Protect from file path to file path

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to sanitise a file, writing the sanitised file to a new path
    editor.protect_file(
        input_file=r"C:\gwpw\input\TestFile_11.doc",
        output_file=r"C:\gwpw\output\editor\protect_f2f\TestFile_11.doc",
    )
    
    

    Protect from file path to memory

    protect_file returns the protected file's bytes. The below example demonstrates assigning the variable file_bytes. We can see that after sanitisation the first 8 bytes of file_bytes matches the file signature for the Microsoft Compound File Binary (CFB) format, D0 CF 11 E0 A1 B1 1A E1.

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to sanitise a file in memory, returning the file bytes in memory
    file_bytes = editor.protect_file(
        input_file=r"C:\gwpw\input\TestFile_11.doc"
    )
    
    assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
    
    

    Protect from memory

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Read file from disk to memory
    with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
        input_bytes = f.read()
    
    # Use the default policy to sanitise a file
    file_bytes = editor.protect_file(
        input_file=input_bytes,
    )
    
    assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
    
    

    Protect files in a directory

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
    editor.protect_directory(
        input_directory=r"C:\gwpw\input",
        output_directory=r"C:\gwpw\output\editor\protect_directory"
    )
    
    

    Protect files in a directory that may contain unsupported file types

    The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: glasswall.libraries.editor.errors) if processing fails. Passing raise_unsupported=False will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
    editor.protect_directory(
        input_directory=r"C:\gwpw\input_with_unsupported_file_types",
        output_directory=r"C:\gwpw\output\editor\protect_directory_unsupported",
        raise_unsupported=False
    )
    
    

    Protect files in a directory using a custom content management policy

    Using glasswall.content_management.policies.Editor:

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use a custom Editor policy to sanitise all files in the input directory
    # and write them to the input_sanitised directory. If macros are present
    # in ppt or word files, the file will be marked as non-conforming and blocked.
    # If internal or external hyperlinks are present in word files they will not
    # be sanitised, and will remain in the regenerated document.
    editor.protect_directory(
        input_directory=r"C:\gwpw\input",
        output_directory=r"C:\gwpw\output\editor\protect_directory_custom",
        content_management_policy=glasswall.content_management.policies.Editor(
            default="sanitise",
            config={
                "pptConfig": {
                    "macros": "disallow",
                },
                "wordConfig": {
                    "internal_hyperlinks": "allow",
                    "external_hyperlinks": "allow",
                    "macros": "disallow",
                }
            }
        )
    )
    
    

    Protect files in a directory conditionally based on file format

    The example below demonstrates processing only .doc and .docx files from a nested directory containing multiple file formats.

    import os
    
    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    input_directory = r"C:\gwpw\input"
    output_directory = r"C:\gwpw\output\editor\protect_directory_file_format"
    
    # Iterate relative file paths from input_directory
    for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
        # Construct absolute paths
        input_file = os.path.join(input_directory, relative_file)
        output_file = os.path.join(output_directory, relative_file)
    
        # Get the file type of the file
        file_type = editor.determine_file_type(
            input_file=input_file,
            as_string=True,
            raise_unsupported=False
        )
    
        # Protect only doc and docx files
        if file_type in ["doc", "docx"]:
            editor.protect_file(input_file, output_file)
    
    

    Analysis

    An Embedded Engine report provides a detailed, file-type agnostic description of data and is logged in an XML format. The structure of this report follows an Analysis Report XSD, which is designed to simplify parsing and processing, ensuring easier integration and analysis of the data. See Engine Reporting.

    Files can be analysed individually from a file path or in memory using the analyse_file method, or all files from a directory can be analysed using the analyse_directory method.

    Analyse from file path to file path

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to analyse a file, writing the analysis report to a new path
    editor.analyse_file(
        input_file=r"C:\gwpw\input\TestFile_11.doc",
        output_file=r"C:\gwpw\output\editor\analyse_f2f\TestFile_11.doc.xml",
    )
    
    

    Analyse from file path to memory

    analyse_file returns the analysis report xml file's bytes. The below example demonstrates assigning the variable analysis_report and checking the contents of the beginning of an Editor analysis report.

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to analyse a file
    analysis_report = editor.analyse_file(
        input_file=r"C:\gwpw\input\TestFile_11.doc",
    )
    
    assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'
    
    

    Analyse from memory

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Read file from disk to memory
    with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
        input_bytes = f.read()
    
    # Use the default policy to analyse a file
    analysis_report = editor.analyse_file(
        input_file=input_bytes,
    )
    
    assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'
    
    

    Analyse files in a directory

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
    editor.analyse_directory(
        input_directory=r"C:\gwpw\input",
        output_directory=r"C:\gwpw\output\editor\analyse_directory"
    )
    
    

    Analyse files in a directory that may contain unsupported file types

    The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: glasswall.libraries.editor.errors) if processing fails. Passing raise_unsupported=False will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
    editor.analyse_directory(
        input_directory=r"C:\gwpw\input_with_unsupported_file_types",
        output_directory=r"C:\gwpw\output\editor\analyse_directory_unsupported",
        raise_unsupported=False
    )
    
    

    Analyse files in a directory using a custom content management policy

    Using glasswall.content_management.policies.Editor:

    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    # Use a custom Editor policy to analyse all files in the input directory
    # and write them to analyse_directory_custom directory. If macros are
    # present in ppt or word files, a GeneralFail exception will be raised if the
    # raise_unsupported argument is left at its default value of False, but the
    # analysis report will still be written to file and will contain IssueItems.
    # If internal or external hyperlinks are present in word files they will not
    # be sanitised, and will remain in the regenerated document.
    editor.analyse_directory(
        input_directory=r"C:\gwpw\input",
        output_directory=r"C:\gwpw\output\editor\analyse_directory_custom",
        content_management_policy=glasswall.content_management.policies.Editor(
            default="sanitise",
            config={
                "pptConfig": {
                    "macros": "disallow",
                },
                "wordConfig": {
                    "internal_hyperlinks": "allow",
                    "external_hyperlinks": "allow",
                    "macros": "disallow",
                }
            }
        ),
        raise_unsupported=False
    )
    
    

    Analyse files in a directory conditionally based on file format

    The example below demonstrates processing only .doc and .docx files from a nested directory containing multiple file formats.

    import os
    
    import glasswall
    
    
    # Load the Glasswall Editor library
    editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
    
    input_directory = r"C:\gwpw\input"
    output_directory = r"C:\gwpw\output\editor\analyse_directory_file_format"
    
    # Iterate relative file paths from input_directory
    for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
        # Construct absolute paths
        input_file = os.path.join(input_directory, relative_file)
        output_file = os.path.join(output_directory, relative_file + ".xml")
    
        # Get the file type of the file
        file_type = editor.determine_file_type(
            input_file=input_file,
            as_string=True,
            raise_unsupported=False
        )
    
        # Analyse only doc and docx files
        if file_type in ["doc", "docx"]:
            editor.analyse_file(input_file, output_file)
    
    

    Protect and Analyse

    These high-level functions let you run Protect with Analysis methods within a single session. For more information, see the documentation links below.


    API Documentation

    https://glasswall-python-wrapper-documentation.glasswall.com/


    Was this article helpful?