Word Search & Redaction
The Glasswall Embedded Engine provides a pattern matching capability in the following file formats:
- Microsoft Binary Office
- Office Open XML
- ASCII and UTF-8 plain text (when
enable_text_supportis specifiedtrueundersysConfig)
The search strings are configured via a policy file, where they can be specified as either a text item or a regex item:
- Text - Match only distinct words or numbers. Words and numbers are considered distinct if the character immediately preceding or succeeding the match is not a letter or digit respectively, meaning
orwill not produce a match when found in "ore", "word" or "door". - Regex - Match anywhere the regular expression pattern is found. This includes matches within distinct words or numbers, e.g. a regular expression of
r[aeiou]+will match the "re" in "regular", "expression" and "anywhere".- Word Search does not support regular expression assertions. Regular expression containing
^or$will return matches found anywhere in the file and regular expressions containing a lookaround will not return any matches.
- Word Search does not support regular expression assertions. Regular expression containing
For every pattern matched, the following actions (textSetting) can be taken:
- Allow - Produce an XML analysis report specifying the number of matching strings within the file and their location
- Disallow - Report all matches and do not regenerate the input file if any are found
- Redact - Report matches and regenerate the input file with all instances replaced with a character specified in the policy file with
replacementChar. This action is only available for Microsoft Binary Office and Office Open XML files. - Require - Report all matches and do not regenerate the input file unless at least one match is found. This action is only available for plain text files, and at least one must be specified.
The APIs for Word Search support string, character based, and regular expression matching. See Word Search Library for Word Search API documentation.