Glasswall Conform is designed to preprocess PDF files to meet standards for further processing. It extracts and reconstructs visual content and should be used in conjunction with the Glasswall Embedded Engine for complete Content Disarm and Reconstruction (CDR) protection.
This document offers instructions on using Conform for reconstructing PDF documents, along with several examples for invoking the command-line tool.
Installation
Conform is installed as a system-wide command accessible from your terminal.
Once installed, the glasswall_conform
command will be available on your system PATH
. You may need to restart your terminal session for this change to take effect.
Windows
Conform for Windows is distributed as an .exe
installer. It installs to C:\Program Files (x86)\Glasswall Conform
and adds this folder to your system PATH.
Install Conform by running the installer and following the instructions:
.\glasswall-conform-1.1.0.exe
Or install it silently for automation or CI environments:
.\glasswall-conform-1.1.0.exe /VERYSILENT
Linux
Conform for Linux is distributed as both .rpm
and .deb
packages.
Both install files to /opt/glasswall_conform
and create a symbolic link in /usr/local/bin
to allow running glasswall_conform
from the command line.
RPM (e.g. Rocky 9, Rocky 8)
sudo yum -y install ./glasswall_conform-1.1.0-1.x86_64.rpm
DEB (e.g. Ubuntu 24.04, Ubuntu 22.04)
sudo apt-get -y install ./glasswall-conform_1.1.0_amd64.deb
Setup
Before calling glasswall_conform
, ensure that your environment is set up correctly.
Linux
For processing modes that utilise the Embedded Engine, LD_LIBRARY_PATH
must be set to include the directory containing the Embedded Engine. For example, if the Embedded Engine is at path /home/azureuser/glasswall/Release-16.2.0
you can temporarily modify LD_LIBRARY_PATH
:
export LD_LIBRARY_PATH=/home/azureuser/glasswall/Release-16.2.0:$LD_LIBRARY_PATH
Ubuntu
On Ubuntu-based systems, if you encounter the error message libgthread-2.0.so.0: cannot open shared object file: No such file or directory
, you can resolve it by installing the necessary package with the following command:
DEBIAN_FRONTEND=noninteractive && apt update && apt install -y libglib2.0-0
Windows
We recommend installing Windows dependencies using chocolatey.
For all processing modes, Microsoft Visual C++ Redistributable must be installed.
For processing modes that utilise the Embedded Engine:
-
PATH
must be set to include the directory containing the Embedded Engine. For example, if the Embedded Engine is at pathC:/glasswall/Release-16.2.0
you can temporarily modifyPATH
:SET "PATH=%PATH%;C:/glasswall/Release-16.2.0"
-
OpenSSL light or OpenSSL must be installed.
Example Windows docker installation of vcredist140 and openssl.light using chocolatey:
# escape=`
FROM mcr.microsoft.com/windows/servercore:ltsc2022
USER ContainerAdministrator
WORKDIR C:\temp\
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
# Download and install Chocolatey, to install OpenSSL and Visual C++ Redistributable
RUN Invoke-WebRequest -Uri 'https://chocolatey.org/install.ps1' -OutFile 'install.ps1'; `
./install.ps1; `
Remove-Item install.ps1; `
Import-Module "$env:ChocolateyInstall/helpers/chocolateyProfile.psm1"; `
choco install -y --fail-on-unfound --no-progress --stop-on-first-package-failure vcredist140; `
choco install -y --fail-on-unfound --no-progress --stop-on-first-package-failure openssl.light;
Processing modes
Conform is run from the command line and offers several processing modes for processing files. When calling glasswall_conform
, the first positional argument specifies the processing mode. Available processing modes are:
- engine: Protects files using the Engine. Non-conforming files are reconstructed by Conform and then processed by the Engine.
- conform_only: Reconstructs files using Conform only, without providing CDR protection.
- engine_memory: Accepts a base64-encoded file via standard input. Protects a single file in memory using the Engine. If the file is non-conforming, it is reconstructed using Conform and then processed by the Engine. The processed file is returned via standard output, or an error is returned via standard error.
- conform_only_memory: Accepts a base64-encoded file via standard input. Reconstructs a single file using Conform only, without providing CDR protection. The reconstructed file is returned via standard output, or an error is returned via standard error.
To show available processing modes:
glasswall_conform -h
engine
This processing mode is the intended default and cleans files using Glasswall CDR technology. It requires access to the Embedded Engine and a valid licence.
For an example of invoking this processing mode, see: End to end protection.
Processed files are sorted into one of three output subdirectories:
- 01_engine_success: Files successfully processed by the Embedded Engine without the need for reconstruction by Conform.
- 02_conform_engine_success: PDF files that were initially unable to be processed by the Embedded Engine, but were reconstructed by Conform and then successfully processed by the Embedded Engine.
- 03_failure: Files that failed to be processed using both the Embedded Engine and Conform, or that contain content that has been set to disallow using a custom content management policy.
To show the command line arguments for the engine
processing mode:
glasswall_conform engine -h
conform_only
This processing mode reconstructs files without utilising the Embedded Engine. It does not provide CDR protection.
For an example of invoking this processing mode, see: Reconstructing files without CDR protection
Processed files are sorted into one of two output subdirectories:
- 01_conform_success: Files successfully reconstructed by Conform.
- 02_failure: Files that failed to be reconstructed by Conform.
To show the command line arguments for the conform_only
processing mode:
glasswall_conform conform_only -h
engine_memory
This mode accepts a base64-encoded file via standard input and processes it using the Embedded Engine. If the file is non-conforming, it is reconstructed by Conform, then processed by the Engine. The final output is returned via standard output, or an error is returned via standard error. No files are written to disk.
This mode is ideal for integrating with systems that hold files in memory and do not rely on filesystem input or output.
For an example of invoking this processing mode, see: Processing files in memory without reading from or writing to disk.
To show the command line arguments for the engine_memory
processing mode:
glasswall_conform engine_memory -h
The --file-name
optional argument can be used to specify the name of the in-memory file. This is used when writing logs and the post processing summary, and defaults to the first 8 characters of the base64 encoded data if not specified.
conform_only_memory
This mode accepts a base64-encoded file via standard input and reconstructs it using Conform only (without CDR protection). The reconstructed file is returned via standard output, or an error is returned via standard error. No files are written to disk.
For an example of invoking this processing mode, see: Processing files in memory without reading from or writing to disk.
To show the command line arguments for the conform_only_memory
processing mode:
glasswall_conform conform_only_memory -h
Testing
A dataset of PDF test files for evaluating Conform is available upon request. Please contact us to request access to the test files via Kiteworks.
Examples
- Examples
- End to end protection
- Reconstructing files without CDR protection
- Processing files in memory without reading from or writing to disk
- Fast mode and cautious mode
- Glasswall Python Wrapper functionality
- Multiprocessing
- Logging
- Customise content handling rates
- Watermarking
- CID suppression
- Font replacement
- File inclusion and exclusion filtering
- Output file structure and categorisation
- Post processing summary
End to end protection
This example demonstrates using the engine processing mode at its most basic level.
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0
Example input directory:
/home/azureuser/input_files
conforming_docx.docx
conforming_pdf.pdf
corrupt_docx.docx
nonconforming_pdf.pdf
unsupported_filetype.txt
Example output directory after processing:
/home/azureuser/output_files
โโโโ01_engine_success
โ conforming_docx.docx
โ conforming_pdf.pdf
โ
โโโโ02_conform_engine_success
โ nonconforming_pdf.pdf
โ
โโโโ03_failure
corrupt_docx.docx
unsupported_filetype.txt
Note that the subdirectory names can be customised using the following arguments:
- --engine-success-path: Optional. Output subdirectory name for files that were successfully processed by the Embedded Engine without the need for reconstruction by Conform. Default 01_engine_success
- --conform-success-path: Optional. Output subdirectory name for files that were initially unable to be processed by the Embedded Engine, but were reconstructed by Conform and then successfully processed by the Embedded Engine. Default 02_conform_engine_success
- --failure-path: Optional. Output subdirectory name for files that failed to be processed using both the Embedded Engine and Conform. Default 03_failure
If it is desired that all successfully protected files are written to the same output directory, regardless of whether or not Conform was used to reconstruct the file, you can specify to write files to the same success subdirectory path. For example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --engine-success-path success --conform-success-path success --failure-path failure
Example truncated terminal output after processing:
Glasswall Conform processed 3/5 files (60.00%)
Glasswall Conform failed to process 2/5 files. (40.00%)
Exceptions:
PdfExtractionError (Total: 2)
- 1x Unable to extract content from PDF: '/home/azureuser/input_files/corrupt_docx.docx'
- 1x Unable to extract content from PDF: '/home/azureuser/input_files/unsupported_filetype.txt'
2024-11-06 14:28:50.242 glasswall_conform.config.logging INFO engine_mode Total elapsed time: 5.55 seconds
Reconstructing files without CDR protection
The conform_only
processing mode does not provide CDR protection, and requires only an input directory -i
and an output directory -o
. See conform_only.
glasswall_conform conform_only -i /home/azureuser/input_files -o /home/azureuser/output_files
Processing files in memory without reading from or writing to disk
The engine_memory and conform_only_memory processing modes can be used to process files in memory without the use of I/O.
If processing is successful, the base64-encoded output file is returned via standard output. If an error occurred during processing, an error message and the post processing summary will be written to standard error.
Example standard error for a timeout failure:
Error: Processing failed for file: 'hus11976.pdf'. Summary: {'conform_version': '0.11.2', 'operating_system': 'Windows', 'summary_verbosity': 'all', 'processing_rates': {'success': 0.0, 'failure': 100.0}, 'processing_counts': {'success': 0, 'failure': 1, 'total': 1}, 'processing_time': {'elapsed_seconds': 8.25, 'files_per_sec': 0.12, 'secs_per_file': 8.25}, 'processing_arguments': {'mode': 'engine_memory', 'library_directory': 'C:/azure/sdk.editor/2.1394.0/build-sdk-editor-windows-amd64-dev_license', 'cautious_mode': False, 'max_workers': 1, 'timeout_seconds': 5.0, 'memory_limit_gib': 11.96, 'function_name': 'protect_file', 'content_management_policy': None}, 'processing_success': [], 'processing_failure': [{'file_name': 'hus11976.pdf', 'timed_out': True, 'out_of_memory': False, 'max_memory_used_in_gib': 0.32227325439453125, 'elapsed_time': 5.0007593631744385, 'exception': 'TimeoutError()', 'success': False}]}
Basic usage examples:
Pipe a base64-encoded string of an in memory PDF file directly into glasswall_conform
echo "U29tZUJhc2U2NERhdGE=" | glasswall_conform.exe engine_memory -l "C:/azure/sdk.editor/2.1394.0" --file-name "SomeBase64Data"
Or using Python via subprocess
import base64
import os
import subprocess
# File in memory, for this example simply loaded from a file path
file_path = r"C:\conform\input\Set-08-016599.pdf"
with open(file_path, "rb") as f:
file_bytes = f.read()
# Convert to base64-encoded string
encoded_file_bytes = base64.b64encode(file_bytes).decode("utf-8")
file_name = os.path.basename(file_path)
command = " ".join(
[
"glasswall_conform",
"engine_memory",
'-l "C:/azure/sdk.editor/2.1394.0"',
f'--file-name "{file_name}"', # Optional, used for summary and logs
f'--summary-path "C:/conform/summary_{file_name}.json"', # Optional
'--watermark "Processed for security: Visual elements may vary"', # Optional
]
)
# Run Conform with the base64-encoded string as an input
result = subprocess.run(command, input=encoded_file_bytes, text=True, capture_output=True, shell=True)
if result.stderr:
# Conform failed, handle error gracefully here
print(result.stderr)
else:
# Conform succeeded, convert the conformed file from base64 to bytes
conformed_file_bytes = base64.b64decode(result.stdout)
# Do something with the conformed file bytes, e.g. write to a file
with open("conformed_file.pdf", "wb") as f:
f.write(conformed_file_bytes)
Similarly, conform_only_memory
mode can be used by replacing engine_memory
in the above examples, and omitting the -l
argument as it is not required for this mode.
Fast mode and cautious mode
Fast mode is the default processing mode in Conform. It offers the fastest processing speed and the best visual appearance for PDF files, but may not be suitable for scenarios that require very strict compliance with PDF specifications.
If fast mode is disabled or cannot process a file, Conform automatically falls back to cautious mode. This mode prioritises compliance and risk reduction by replacing embedded fonts, which helps mitigate issues associated with custom or unknown fonts. Cautious mode may result in lower visual fidelity, such as degraded or missing images, inconsistent font sizes, or missing text.
- Disabling fast mode is only recommended when very strict compliance with PDF standards is essential, even at the cost of visual fidelity.
- Disabling cautious mode is only recommended when preserving embedded fonts is essential, or when visual appearance is more important than Conform being able to successfully process a wider range of PDFs.
Fast mode can be disabled using the optional --disable-fast-mode
command line argument.
Cautious mode can be disabled using the optional --disable-cautious-mode
command line argument.
When Conform processes a file successfully using fast mode:
- Fastest processing speed.
- Best visual appearance.
- Custom embedded fonts are not replaced.
- May not be suitable for scenarios requiring very strict compliance with PDF standards.
When Conform uses the cautious mode fallback:
- Slower processing speed.
- In a small number of cases, may result in reduced visual appearance, such as:
- Degraded or missing images and graphics.
- Differences in text appearance (e.g. size, font style, or spacing).
- Missing text when unknown embedded fonts are in use.
- Processes PDFs with stricter compliance to specifications.
- Replaces custom embedded fonts with known-good fonts.
Glasswall Python Wrapper functionality
In the engine processing mode, the protect_file
function from the Glasswall Python Wrapper is used by default to process files using the Embedded Engine. This can be changed using the optional -f
command line argument.
A default sanitise
content management policy is applied if a policy file is not specified using the optional -c
command line argument.
The required -l
command line argument should point to a directory containing the Embedded Engine.
The following arguments relate to the Glasswall Python Wrapper:
-l LIBRARY_DIRECTORY, --library-directory LIBRARY_DIRECTORY
Required. Path to directory containing the Embedded Engine.
-f FUNCTION_NAME, --function-name FUNCTION_NAME
Optional. Glasswall Python Wrapper function name to call during multiprocessing, such as 'protect_file' or 'export_file'. Default: 'protect_file'.
-c CONTENT_MANAGEMENT_POLICY, --content-management-policy CONTENT_MANAGEMENT_POLICY
Optional. Path to Embedded Engine content management policy file. If not provided, the default 'sanitise' policy is used.
--log-level-console-wrapper {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Optional. Set logging level for writing Glasswall Python Wrapper logs to console. Default INFO.
Example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 -f protect_file -c /home/azureuser/glasswall/config.xml
Multiprocessing
All processing modes leverage the Glasswall Python Wrapper's GlasswallProcessManager
to efficiently process files concurrently.
The following arguments relate to multiprocessing:
-w MAX_WORKERS, --max-workers MAX_WORKERS
Optional. Maximum workers for multiprocessing, 0=auto. Default: 0.
-t TIMEOUT_SECONDS, --timeout-seconds TIMEOUT_SECONDS
Optional. Multiprocessing timeout per file in seconds. Default: 180.
-m MEMORY_LIMIT_GIB, --memory-limit-gib MEMORY_LIMIT_GIB
Optional. Multiprocessing memory limit per file in GiB, 0=auto (4GiB min, worker distributed max). Default: 0.
Example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 -t 300 -m 12
Logging
The default logging level for Conform and the Glasswall Python Wrapper is INFO
. The following arguments relate to logging:
--log-level-console {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Optional. Set logging level for writing logs to console. Default INFO.
--log-level-file {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Optional. Set logging level for writing logs to file. If not provided, logs will not be written to file.
--log-path LOG_PATH Optional. Path to output log file. Default is a timestamp-named file located at: '%TEMP%/glasswall_conform/logs'.
--log-level-console-wrapper {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
Optional. Set logging level for writing Glasswall Python Wrapper logs to console. Default INFO.
To suppress most logging:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --log-level-console CRITICAL --log-level-console-wrapper CRITICAL
Customise content handling rates
This section is only applicable when fast mode is disabled.
By default, Conform generates an output file whenever possible, even if only a portion of the original document's content has been successfully handled. This behaviour might not always be desirable, and can be customised for different types of content within each document.
Conform uses "best guesses" when handling malformed, corrupt, or unsupported text content to ensure that as much text as possible is transferred from the original document to the conformed document. For example, if the stroke colour of the text is malformed or in an unsupported colour format, the text is retained in the output document, with the stroke colour defaulting to black.
This "best guess" approach may result in text that appears similar to the original, or in some cases, text that is not visible but still present in the output document. As we cannot guarantee that our best guess will handle the text in the same way as in the original document, the handling rate reflects this as content that has not been fully handled. Consequently, a low handling rate for text does not always indicate that the document will look visually different when best guesses are applied.
There are three arguments available to set the minimum success rates when handling content:
--text-min-success-rate TEXT_MIN_SUCCESS_RATE
Optional. The minimum success rate for processing text. Default: 0.0.
--image-min-success-rate IMAGE_MIN_SUCCESS_RATE
Optional. The minimum success rate for processing images. Default: 0.0.
--graphic-min-success-rate GRAPHIC_MIN_SUCCESS_RATE
Optional. The minimum success rate for processing graphics. Default: 0.0.
If the minimum content handling rate value is not met then processing for the given file will be deemed a failure and the output file will not be written.
Watermarking
Watermarking is disabled by default, but can be enabled using the --watermark
argument. Text will be added with font size 12 in a semi-transparent dark grey colour. The watermark is usually positioned at the top-right of the document, however depending on the rotation that has been applied to the page, the orientation may differ. The maximum text length for a watermark is currently 256 characters.
--watermark WATERMARK
Optional. Adds a watermark to each page of the reconstructed document. Default '' (disabled).
Example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --watermark "Glasswall Conform"
CID suppression
This section is only applicable when fast mode is disabled.
In PDFs, some fonts use a system called CID (Character Identifier) to manage large sets of characters. When constructing a new PDF, if the tool encounters characters that cannot be processed, it replaces them with a default question mark character (?). You can adjust how unprocessable CIDs are represented in your PDFs using the --suppress-cid
argument:
--suppress-cid SUPPRESS_CID
Optional. Replace CID metadata that may be printed to the visual layer due to font array omissions with the supplied string, with placeholder text.
Glasswall Conform restricts the processing of PDFs to only known secure fonts. This is a deliberate security feature to make the PDF conform safely. Default 'โ '.
Example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --suppress-cid "?"
Font replacement
This section is only applicable when fast mode is disabled.
Conform supports bold, italic, and bold italic variants of the base 14 Type1 fonts and the Cambria font. Conform also supports some custom fonts.
The base 14 Type1 fonts are:
- Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique
- Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique
- Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic
- Symbol
- ZapfDingbats
Embedded fonts that are not supported may be replaced with the Cambria font. If Cambria does not support a glyph from an embedded font, the character is suppressed. For more information on this, see CID suppression.
By default, some commonly embedded sans serif fonts are replaced with Helvetica instead of Cambria for visual similarity. This, and other font replacement features, can be modified using these arguments:
--disable-base-14-fonts
Optional. Disable matching embedded fonts to base 14 fonts.
This will result in more fonts being replaced by the fallback font, Cambria. Default False.
--disable-custom-fonts
Optional. Disable matching embedded fonts to custom fonts.
This will result in lower support for custom embedded fonts, and more fonts being replaced by the fallback font, Cambria. Default False.
--disable-sans-serif-replacement
Optional. Disable replacing some sans serif fonts with Helvetica instead of the fallback font, Cambria.
This will result in some replaced sans serif fonts looking more visually different when compared to the original file. Default False.
Example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --disable-custom-fonts
File inclusion and exclusion filtering
Conform allows additional control over which files in the input directory are processed by using include and exclude filters. These filters let you specify which files to process or ignore using basic Unix shell-style wildcards directly from the command line. If a file matches both an inclusion and an exclusion rule, it will be excluded.
By default, if the --include-files
and --exclude-files
arguments are omitted, Conform will process all files that are present in the input directory.
The following arguments relate to file inclusion and exclusion:
--include-files INCLUDE_FILES
Optional. Can be either a path to a file containing file paths/patterns or a semicolon-separated list of patterns (e.g. '*.pdf;*/SET_03/*'). Only matching files will be processed.
If None, all files are included. Default: None.
--exclude-files EXCLUDE_FILES
Optional. Can be either a path to a file containing file paths/patterns or a semicolon-separated list of patterns. Any matching files will be excluded from processing. If None, no
files are excluded. Default: None.
The following table demonstrates examples of some patterns that can be used:
Pattern | Meaning | Example | Matches | Does Not Match |
---|---|---|---|---|
* |
Matches everything | *.pdf |
file.pdf , report.pdf |
file.docx |
? |
Matches any single character | file_?.pdf |
file_1.pdf , file_A.pdf |
file_10.pdf |
[seq] |
Matches any character in seq |
file_[AB].pdf |
file_A.pdf , file_B.pdf |
file_C.pdf |
[!seq] |
Matches any character not in seq |
file_[!AB].pdf |
file_C.pdf , file_D.pdf |
file_A.pdf , file_B.pdf |
Case sensitivity considerations
File names are case-sensitive on Linux but case-insensitive on Windows. This affects how file paths or patterns are interpreted across different operating systems.
- On Linux,
report.pdf
andReport.pdf
are treated as different files. - On Windows, both are considered the same file.
Recommendation:
To ensure consistency across platforms, use consistent casing in file names and patterns. If working across multiple environments, consider using wildcard patterns (*
) where appropriate to avoid mismatches.
Handling single file inclusions
If specifying a single file with --include-files
, be aware that Conform first checks whether the provided value is a file on disk, and if it is not then the value is treated as a pattern.
Potential issue:
If a user specifies:
--include-files "/home/azureuser/input_files/first.pdf"
Conform will see that /home/azureuser/input_files/first.pdf
exists as a file, and attempt to read from it as a list file that contains multiple paths or patterns.
Solution:
To explicitly indicate that this is a pattern for a single file, append a trailing semicolon:
--include-files "/home/azureuser/input_files/first.pdf;"
This ensures that Conform treats the path as a pattern rather than a list file.
Include specific PDF files
To process only PDFs with "report" in the filename:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --include-files "*report*.pdf"
Result: Only files like annual_report.pdf
, summary_report_2023.pdf
, etc., are processed.
Exclude specific PDF files
To process all PDFs except ones containing "draft" in the name:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --exclude-files "*draft*.pdf"
Result: All PDFs are processed, except files like proposal_draft.pdf
and internal_draft_v2.pdf
.
Exclude an entire directory
To exclude all files inside /home/azureuser/input_files/archive/
:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --exclude-files "*/archive/*"
Result: Everything inside /home/azureuser/input_files/archive/
is skipped.
Include and exclude together
If a file matches both an inclusion and an exclusion rule, it will be excluded.
To process all files from SET_03
, but exclude files containing "error_log":
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --include-files "*/SET_03/*" --exclude-files "*error_log*"
Result: Only files from SET_03/
are processed, except any containing "error_log" in the filename.
Using a file for large lists
For more complex filtering, you can provide a file containing multiple patterns or absolute file paths instead of specifying them directly.
Example using an inclusion list file:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --include-files "include_list.txt"
Example include_list.txt
:
*/SET_03/*.pdf
*reports_2023_*.pdf
/home/azureuser/input_files/SET_02/splat.pdf
Result: Processes only files from SET_03/
, files containing reports_2023_
, and the specific file /home/azureuser/input_files/SET_02/splat.pdf
.
Output file structure and categorisation
The directory structure for output files can be customised for both the engine and conform_only processing modes using the --output-structure
command line argument.
--output-structure {categorised,mirrored}
Optional. Defines the directory structure of output files. 'categorised' organises output files into subdirectories based on processing status ('engine_success', 'conform_success', 'failure').
'mirrored' places successfully processed output files directly in the output directory, maintaining the original input directory structure, and failed files will not be copied. Default: categorised.
If omitted, the default categorised
structure is used. Additional options are available to customise the category subdirectory names:
- --engine-success-path: Optional. Output subdirectory name for files that were successfully processed by the Embedded Engine without the need for reconstruction by Conform. Default 01_engine_success
- --conform-success-path: Optional. Output subdirectory name for files that were initially unable to be processed by the Embedded Engine, but were reconstructed by Conform and then successfully processed by the Embedded Engine. Default 02_conform_engine_success
- --failure-path: Optional. Output subdirectory name for files that failed to be processed using both the Embedded Engine and Conform. Default 03_failure
Example categorised
output structure
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0
Example input directory:
/home/azureuser/input_files
conforming_docx.docx
conforming_pdf.pdf
corrupt_docx.docx
nonconforming_pdf.pdf
unsupported_filetype.txt
Example output directory after processing:
/home/azureuser/output_files
โโโโ01_engine_success
โ conforming_docx.docx
โ conforming_pdf.pdf
โ
โโโโ02_conform_engine_success
โ nonconforming_pdf.pdf
โ
โโโโ03_failure
corrupt_docx.docx
unsupported_filetype.txt
When using the categorised
output structure, if it is desired that all successfully protected files are written to the same output directory, regardless of whether or not Conform was used to reconstruct the file, you can specify to write files to the same success subdirectory path. For example:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --engine-success-path success --conform-success-path success --failure-path failure
Example mirrored
output structure
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --output-structure mirrored
Example input directory:
/home/azureuser/input_files
conforming_docx.docx
conforming_pdf.pdf
corrupt_docx.docx
nonconforming_pdf.pdf
unsupported_filetype.txt
Example output directory after processing:
/home/azureuser/output_files
conforming_docx.docx
conforming_pdf.pdf
nonconforming_pdf.pdf
Post processing summary
By default, a summary is written after Conform has finished processing files. The summary provides detailed information such as return statuses, processing time, and memory usage for each file. The --summary-verbosity
argument controls which files are included in the summary. This setting is independent of the logging level and does not affect detailed log outputs.
Available Options
all
(default) - Includes both successfully processed and failed files.failure
- Includes only failed files.success
- Includes only successfully processed files.none
- Disables the summary output completely.
The --summary-path
argument can be used to write the summary to disk as a JSON file instead of only displaying it in the terminal.
Example to include only failed files in the summary output:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --summary-verbosity failure --summary-path /home/azureuser/conform_summary.json
Example to disable the summary output entirely:
glasswall_conform engine -i /home/azureuser/input_files -o /home/azureuser/output_files -l /home/azureuser/glasswall/Release-16.2.0 --summary-verbosity none --summary-path /home/azureuser/conform_summary.json
Example summary JSON output (Windows):
{
"conform_version": "0.10.1",
"operating_system": "Windows",
"summary_verbosity": "all",
"processing_rates": {
"success": 50.0,
"failure": 50.0
},
"processing_counts": {
"success": 2,
"failure": 2,
"total": 4
},
"processing_time": {
"elapsed_seconds": 43.61,
"files_per_sec": 0.09,
"secs_per_file": 10.9
},
"processing_arguments": {
"mode": "engine",
"input_directory": "C:\\conform\\input",
"output_directory": "C:\\conform\\output",
"library_directory": "C:\\azure\\sdk.editor\\2.1394.0",
"cautious_mode": false,
"max_workers": 3,
"timeout_seconds": 180,
"memory_limit_gib": 4.35,
"function_name": "protect_file",
"content_management_policy": null,
"include_files": null,
"exclude_files": null,
"output_structure": "categorised"
},
"processing_success": [
{
"input_file": "C:\\conform\\input\\pal1.bmp",
"output_file": "C:\\conform\\output\\01_engine_success\\pal1.bmp",
"engine_status": "OK(0)",
"max_memory_used_in_gib": 0.11124420166015625,
"elapsed_time": 0.9149298667907715,
"success": true
},
{
"input_file": "C:\\conform\\input\\Set-08-016599.pdf",
"output_file": "C:\\conform\\output\\02_conform_engine_success\\Set-08-016599.pdf",
"engine_status": "GeneralFail(-1)",
"engine_GW2FileErrorMsg": "[FAILURE_LOG_SEM_FONTS_0021897368] Key /FirstChar must be present in a Type 1 Font dictionary other than for standard 14. fonts.",
"engine_conform_fast_status": "GeneralFail(-1)",
"engine_conform_fast_GW2FileErrorMsg": "[FAILURE_LOG_SEM_FONTS_0021897368] Key /FirstChar must be present in a Type 1 Font dictionary other than for standard 14. fonts.",
"engine_conform_cautious_status": "OK(0)",
"max_memory_used_in_gib": 0.22198104858398438,
"elapsed_time": 1.8940067291259766,
"success": true
}
],
"processing_failure": [
{
"input_file": "C:\\conform\\input\\pal1_corrupt.bmp",
"engine_status": "FileTypeUnknown(-7)",
"engine_GW2FileErrorMsg": "Unable to determine file type",
"engine_conform_fast_status": "PdfFastProcessError()",
"engine_conform_cautious_status": "PdfExtractionError(Unable to extract content from PDF: 'C:\\conform\\input\\pal1_corrupt.bmp')",
"exit_code": 0,
"timed_out": false,
"out_of_memory": false,
"max_memory_used_in_gib": 0.13513565063476562,
"elapsed_time": 0.8690056800842285,
"success": false
},
{
"input_file": "C:\\conform\\input\\Straw120556398.pdf",
"timed_out": false,
"out_of_memory": true,
"max_memory_used_in_gib": 4.3571624755859375,
"elapsed_time": 41.976775884628296,
"exception": "MemoryError()",
"success": false
}
]
}