In Protect Mode, content management policies allow control of various file content types such as executable code, interactive form content and a number of actions (e.g., external links or the execution of JavaScript). These file elements are known to be common attack vectors when they are encountered within a file. The content management policy will define how the Glasswall Embedded Engine should process these structures. In [Analysis Mode](#analysis), these are reported as `SanitisationItems`. Content management policy differs across supported file types.

Automatic corrections back to the file specification are also performed upon file regeneration. The purpose of this is to enable the Glasswall Embedded Engine to remove threats that are hidden within the file structure, as well preventing the possibility of activating exploits via the misuse of structural components in the file. In [Analysis Mode](#analysis), these are reported as `RemedyItems`.

Files can be protected individually from a file path or in memory using the [protect_file](./8-Autogenerated%20Docs/libraries/editor/editor/editor.md#protect_file) or [protect_directory](./8-Autogenerated%20Docs/libraries/editor/editor/editor.md#protect_directory) methods.


## Examples

- [Protect](#protect)
    - [Protect from file path to file path](#protect-from-file-path-to-file-path)
    - [Protect from file path to memory](#protect-from-file-path-to-memory)
    - [Protect from memory](#protect-from-memory)
    - [Protect files in a directory](#protect-files-in-a-directory)
    - [Protect files in a directory that may contain unsupported file types](#protect-files-in-a-directory-that-may-contain-unsupported-file-types)
    - [Protect files in a directory using a custom content management policy](#protect-files-in-a-directory-using-a-custom-content-management-policy)
    - [Protect files in a directory conditionally based on file format](#protect-files-in-a-directory-conditionally-based-on-file-format)
- [Analysis](#analysis)
    - [Analyse from file path to file path](#analyse-from-file-path-to-file-path)
    - [Analyse from file path to memory](#analyse-from-file-path-to-memory)
    - [Analyse from memory](#analyse-from-memory)
    - [Analyse files in a directory](#analyse-files-in-a-directory)
    - [Analyse files in a directory that may contain unsupported file types](#analyse-files-in-a-directory-that-may-contain-unsupported-file-types)
    - [Analyse files in a directory using a custom content management policy](#analyse-files-in-a-directory-using-a-custom-content-management-policy)
    - [Analyse files in a directory conditionally based on file format](#analyse-files-in-a-directory-conditionally-based-on-file-format)
- [Protect and Analyse](#protect-and-analyse)


## Protect

### Protect from file path to file path

```py   
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to sanitise a file, writing the sanitised file to a new path
editor.protect_file(
    input_file=r"C:\gwpw\input\TestFile_11.doc",
    output_file=r"C:\gwpw\output\editor\protect_f2f\TestFile_11.doc",
)

```

### Protect from file path to memory

`protect_file` returns the protected file's bytes. The below example demonstrates assigning the variable `file_bytes`. We can see that after sanitisation the first 8 bytes of `file_bytes` matches the [file signature](https://en.wikipedia.org/wiki/List_of_file_signatures) for the Microsoft Compound File Binary (CFB) format, `D0 CF 11 E0 A1 B1 1A E1`.

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to sanitise a file in memory, returning the file bytes in memory
file_bytes = editor.protect_file(
    input_file=r"C:\gwpw\input\TestFile_11.doc"
)

assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'

```

### Protect from memory

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Read file from disk to memory
with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
    input_bytes = f.read()

# Use the default policy to sanitise a file
file_bytes = editor.protect_file(
    input_file=input_bytes,
)

assert file_bytes[:8] == b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'

```

### Protect files in a directory

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
editor.protect_directory(
    input_directory=r"C:\gwpw\input",
    output_directory=r"C:\gwpw\output\editor\protect_directory"
)

```

### Protect files in a directory that may contain unsupported file types

The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: [glasswall.libraries.editor.errors](./8-Autogenerated%20Docs/libraries/editor/errors/errors.md)) if processing fails. Passing `raise_unsupported=False` will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to protect a directory of files, writing the sanitised files to a new directory.
editor.protect_directory(
    input_directory=r"C:\gwpw\input_with_unsupported_file_types",
    output_directory=r"C:\gwpw\output\editor\protect_directory_unsupported",
    raise_unsupported=False
)

```

### Protect files in a directory using a custom content management policy

Using `glasswall.content_management.policies.Editor`:

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use a custom Editor policy to sanitise all files in the input directory
# and write them to the input_sanitised directory. If macros are present
# in ppt or word files, the file will be marked as non-conforming and blocked.
# If internal or external hyperlinks are present in word files they will not
# be sanitised, and will remain in the regenerated document.
editor.protect_directory(
    input_directory=r"C:\gwpw\input",
    output_directory=r"C:\gwpw\output\editor\protect_directory_custom",
    content_management_policy=glasswall.content_management.policies.Editor(
        default="sanitise",
        config={
            "pptConfig": {
                "macros": "disallow",
            },
            "wordConfig": {
                "internal_hyperlinks": "allow",
                "external_hyperlinks": "allow",
                "macros": "disallow",
            }
        }
    )
)

```

### Protect files in a directory conditionally based on file format
The example below demonstrates processing only `.doc` and `.docx` files from a nested directory containing multiple file formats.
```py
import os

import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

input_directory = r"C:\gwpw\input"
output_directory = r"C:\gwpw\output\editor\protect_directory_file_format"

# Iterate relative file paths from input_directory
for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
    # Construct absolute paths
    input_file = os.path.join(input_directory, relative_file)
    output_file = os.path.join(output_directory, relative_file)

    # Get the file type of the file
    file_type = editor.determine_file_type(
        input_file=input_file,
        as_string=True,
        raise_unsupported=False
    )

    # Protect only doc and docx files
    if file_type in ["doc", "docx"]:
        editor.protect_file(input_file, output_file)

```

---

## Analysis

An Embedded Engine report provides a detailed, file-type agnostic description of data and is logged in an XML format. The structure of this report follows an Analysis Report XSD, which is designed to simplify parsing and processing, ensuring easier integration and analysis of the data. See [Engine Reporting](/embedded-engine/embedded-engine-reporting).

Files can be analysed individually from a file path or in memory using the [analyse_file](./8-Autogenerated%20Docs/libraries/editor/editor/editor.md#analyse_file) method, or all files from a directory can be analysed using the [analyse_directory](./8-Autogenerated%20Docs/libraries/editor/editor/editor.md#analyse_directory) method.


### Analyse from file path to file path

```py   
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to analyse a file, writing the analysis report to a new path
editor.analyse_file(
    input_file=r"C:\gwpw\input\TestFile_11.doc",
    output_file=r"C:\gwpw\output\editor\analyse_f2f\TestFile_11.doc.xml",
)

```

### Analyse from file path to memory

`analyse_file` returns the analysis report xml file's bytes. The below example demonstrates assigning the variable `analysis_report` and checking the contents of the beginning of an Editor analysis report.

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to analyse a file
analysis_report = editor.analyse_file(
    input_file=r"C:\gwpw\input\TestFile_11.doc",
)

assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'

```

### Analyse from memory

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Read file from disk to memory
with open(r"C:\gwpw\input\TestFile_11.doc", "rb") as f:
    input_bytes = f.read()

# Use the default policy to analyse a file
analysis_report = editor.analyse_file(
    input_file=input_bytes,
)

assert analysis_report[:500] == b'<?xml version="1.0" encoding="utf-8"?>\n<gw:GWallInfo xsi:schemaLocation="http://glasswall.com/namespace/gwallInfo.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gw="http://glasswall.com/namespace">\n\t<gw:DocumentStatistics>\n\t\t<gw:DocumentSummary>\n\t\t\t<gw:TotalSizeInBytes>35840</gw:TotalSizeInBytes>\n\t\t\t<gw:FileType>doc</gw:FileType>\n\t\t\t<gw:Version>Not Applicable</gw:Version>\n\t\t\t<gw:InputSHA256>9FDE85B8800C1019D2865FA298A7F75873E09870B71F9825827E354B865686A6</gw:InputSHA256>\n\t\t\t<gw'

```

### Analyse files in a directory

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
editor.analyse_directory(
    input_directory=r"C:\gwpw\input",
    output_directory=r"C:\gwpw\output\editor\analyse_directory"
)

```

### Analyse files in a directory that may contain unsupported file types

The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: [glasswall.libraries.editor.errors](./8-Autogenerated%20Docs/libraries/editor/errors/errors.md)) if processing fails. Passing `raise_unsupported=False` will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure.

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use the default policy to analyse a directory of files, writing the analysis reports to a new directory.
editor.analyse_directory(
    input_directory=r"C:\gwpw\input_with_unsupported_file_types",
    output_directory=r"C:\gwpw\output\editor\analyse_directory_unsupported",
    raise_unsupported=False
)

```

### Analyse files in a directory using a custom content management policy

Using `glasswall.content_management.policies.Editor`:

```py
import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

# Use a custom Editor policy to analyse all files in the input directory
# and write them to analyse_directory_custom directory. If macros are
# present in ppt or word files, a GeneralFail exception will be raised if the
# raise_unsupported argument is left at its default value of False, but the
# analysis report will still be written to file and will contain IssueItems.
# If internal or external hyperlinks are present in word files they will not
# be sanitised, and will remain in the regenerated document.
editor.analyse_directory(
    input_directory=r"C:\gwpw\input",
    output_directory=r"C:\gwpw\output\editor\analyse_directory_custom",
    content_management_policy=glasswall.content_management.policies.Editor(
        default="sanitise",
        config={
            "pptConfig": {
                "macros": "disallow",
            },
            "wordConfig": {
                "internal_hyperlinks": "allow",
                "external_hyperlinks": "allow",
                "macros": "disallow",
            }
        }
    ),
    raise_unsupported=False
)

```

### Analyse files in a directory conditionally based on file format
The example below demonstrates processing only `.doc` and `.docx` files from a nested directory containing multiple file formats.
```py
import os

import glasswall


# Load the Glasswall Editor library
editor = glasswall.Editor(r"C:\gwpw\libraries\10.0")

input_directory = r"C:\gwpw\input"
output_directory = r"C:\gwpw\output\editor\analyse_directory_file_format"

# Iterate relative file paths from input_directory
for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False):
    # Construct absolute paths
    input_file = os.path.join(input_directory, relative_file)
    output_file = os.path.join(output_directory, relative_file + ".xml")

    # Get the file type of the file
    file_type = editor.determine_file_type(
        input_file=input_file,
        as_string=True,
        raise_unsupported=False
    )

    # Analyse only doc and docx files
    if file_type in ["doc", "docx"]:
        editor.analyse_file(input_file, output_file)

```

---

## Protect and Analyse

These high-level functions let you run Protect with Analysis methods within a single session. For more information, see the documentation links below.

- [protect_and_analyse_file](./8-Autogenerated%20Docs/libraries/editor/editor.md#protect_and_analyse_file)
- [protect_and_analyse_directory](./8-Autogenerated%20Docs/libraries/editor/editor.md#protect_and_analyse_directory)