In Protect Mode, content management policies allow control of various file content types such as executable code, interactive form content and a number of actions (e.g., external links or the execution of JavaScript). These file elements are known to be common attack vectors when they are encountered within a file. The content management policy will define how the Glasswall Embedded Engine should process these structures. In Analysis Mode, these are reported as SanitisationItems
. Content management policy differs across supported file types.
Automatic corrections back to the file specification are also performed upon file regeneration. The purpose of this is to enable the Glasswall Embedded Engine to remove threats that are hidden within the file structure, as well preventing the possibility of activating exploits via the misuse of structural components in the file. In Analysis Mode, these are reported as RemedyItems
.
Files can be protected individually from a file path or in memory using the protect_file method, or all files from a directory can be protected using the protect_directory method. Alternatively, protected files and analysis reports can be generated within a single session using the protect_and_analyse_file or protect_and_analyse_directory methods.
Examples
- Protect
- Protect from file path to file path
- Protect from file path to memory
- Protect from memory
- Protect files in a directory
- Protect files in a directory that may contain unsupported file types
- Protect files in a directory using a custom content management policy
- Protect files in a directory conditionally based on file format
- Analysis
- Analyse from file path to file path
- Analyse from file path to memory
- Analyse from memory
- Analyse files in a directory
- Analyse files in a directory that may contain unsupported file types
- Analyse files in a directory using a custom content management policy
- Analyse files in a directory conditionally based on file format
- Protect and Analyse
Protect
Protect from file path to file path
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to sanitise a file, writing the sanitised file to a new path
editor.protect_file(
input_file=r"C:\gwpw\input\TestFile_11.doc",
output_file=r"C:\gwpw\output\editor\protect_f2f\TestFile_11.doc",
)
Protect from file path to memory
protect_file
returns the protected file's bytes. The below example demonstrates assigning the variable file_bytes
. We can see that after sanitisation the first 8 bytes of file_bytes
matches the file signature for the Microsoft Compound File Binary (CFB) format, D0 CF 11 E0 A1 B1 1A E1
.
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to sanitise a file in memory, returning the file bytes in memory
file_bytes = editor.protect_file(
input_file=r"C:\gwpw\input\TestFile_11.doc"
)
assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
Protect from memory
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Read file from disk to memory
with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
input_bytes = f.read()
# Use the default policy to sanitise a file
file_bytes = editor.protect_file(
input_file=input_bytes,
)
assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
Protect files in a directory
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
editor.protect_directory(
input_directory=r"C:\gwpw\input",
output_directory=r"C:\gwpw\output\editor\protect_directory"
)
Protect files in a directory that may contain unsupported file types
The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: glasswall.libraries.editor.errors) if processing fails. Passing raise_unsupported=False
will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
editor.protect_directory(
input_directory=r"C:\gwpw\input_with_unsupported_file_types",
output_directory=r"C:\gwpw\output\editor\protect_directory_unsupported",
raise_unsupported=False
)
Protect files in a directory using a custom content management policy
Using glasswall.content_management.policies.Editor
:
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use a custom Editor policy to sanitise all files in the input directory
# and write them to the input_sanitised directory. If macros are present
# in ppt or word files, the file will be marked as non-conforming and blocked.
# If internal or external hyperlinks are present in word files they will not
# be sanitised, and will remain in the regenerated document.
editor.protect_directory(
input_directory=r"C:\gwpw\input",
output_directory=r"C:\gwpw\output\editor\protect_directory_custom",
content_management_policy=glasswall.content_management.policies.Editor(
default="sanitise",
config={
"pptConfig": {
"macros": "disallow",
},
"wordConfig": {
"internal_hyperlinks": "allow",
"external_hyperlinks": "allow",
"macros": "disallow",
}
}
)
)
Protect files in a directory conditionally based on file format
The example below demonstrates processing only .doc
and .docx
files from a nested directory containing multiple file formats.
import os
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
input_directory = r"C:\gwpw\input"
output_directory = r"C:\gwpw\output\editor\protect_directory_file_format"
# Iterate relative file paths from input_directory
for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
# Construct absolute paths
input_file = os.path.join(input_directory, relative_file)
output_file = os.path.join(output_directory, relative_file)
# Get the file type of the file
file_type = editor.determine_file_type(
input_file=input_file,
as_string=True,
raise_unsupported=False
)
# Protect only doc and docx files
if file_type in ["doc", "docx"]:
editor.protect_file(input_file, output_file)
Analysis
An Embedded Engine report provides a detailed, file-type agnostic description of data and is logged in an XML format. The structure of this report follows an Analysis Report XSD, which is designed to simplify parsing and processing, ensuring easier integration and analysis of the data. See Engine Reporting.
Files can be analysed individually from a file path or in memory using the analyse_file method, or all files from a directory can be analysed using the analyse_directory method.
Analyse from file path to file path
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to analyse a file, writing the analysis report to a new path
editor.analyse_file(
input_file=r"C:\gwpw\input\TestFile_11.doc",
output_file=r"C:\gwpw\output\editor\analyse_f2f\TestFile_11.doc.xml",
)
Analyse from file path to memory
analyse_file
returns the analysis report xml file's bytes. The below example demonstrates assigning the variable analysis_report
and checking the contents of the beginning of an Editor analysis report.
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to analyse a file
analysis_report = editor.analyse_file(
input_file=r"C:\gwpw\input\TestFile_11.doc",
)
assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'
Analyse from memory
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Read file from disk to memory
with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
input_bytes = f.read()
# Use the default policy to analyse a file
analysis_report = editor.analyse_file(
input_file=input_bytes,
)
assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'
Analyse files in a directory
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
editor.analyse_directory(
input_directory=r"C:\gwpw\input",
output_directory=r"C:\gwpw\output\editor\analyse_directory"
)
Analyse files in a directory that may contain unsupported file types
The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: glasswall.libraries.editor.errors) if processing fails. Passing raise_unsupported=False
will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
editor.analyse_directory(
input_directory=r"C:\gwpw\input_with_unsupported_file_types",
output_directory=r"C:\gwpw\output\editor\analyse_directory_unsupported",
raise_unsupported=False
)
Analyse files in a directory using a custom content management policy
Using glasswall.content_management.policies.Editor
:
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
# Use a custom Editor policy to analyse all files in the input directory
# and write them to analyse_directory_custom directory. If macros are
# present in ppt or word files, a GeneralFail exception will be raised if the
# raise_unsupported argument is left at its default value of False, but the
# analysis report will still be written to file and will contain IssueItems.
# If internal or external hyperlinks are present in word files they will not
# be sanitised, and will remain in the regenerated document.
editor.analyse_directory(
input_directory=r"C:\gwpw\input",
output_directory=r"C:\gwpw\output\editor\analyse_directory_custom",
content_management_policy=glasswall.content_management.policies.Editor(
default="sanitise",
config={
"pptConfig": {
"macros": "disallow",
},
"wordConfig": {
"internal_hyperlinks": "allow",
"external_hyperlinks": "allow",
"macros": "disallow",
}
}
),
raise_unsupported=False
)
Analyse files in a directory conditionally based on file format
The example below demonstrates processing only .doc
and .docx
files from a nested directory containing multiple file formats.
import os
import glasswall
# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")
input_directory = r"C:\gwpw\input"
output_directory = r"C:\gwpw\output\editor\analyse_directory_file_format"
# Iterate relative file paths from input_directory
for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
# Construct absolute paths
input_file = os.path.join(input_directory, relative_file)
output_file = os.path.join(output_directory, relative_file + ".xml")
# Get the file type of the file
file_type = editor.determine_file_type(
input_file=input_file,
as_string=True,
raise_unsupported=False
)
# Analyse only doc and docx files
if file_type in ["doc", "docx"]:
editor.analyse_file(input_file, output_file)
Protect and Analyse
These high-level functions let you run Protect with Analysis methods within a single session. For more information, see the documentation links below.
API Documentation
https://glasswall-python-wrapper-documentation.glasswall.com/