Skip to main content
Version: 16.8.2

Word Search & Redaction

The Glasswall Embedded Engine provides a pattern matching capability in the following file formats:

  • Microsoft Binary Office
  • Office Open XML
  • ASCII and UTF-8 plain text (when enable_text_support is specified true under sysConfig)

The search strings are configured via a policy file, where they can be specified as either a text item or a regex item:

  • Text - Match only distinct words or numbers. Words and numbers are considered distinct if the character immediately preceding or succeeding the match is not a letter or digit respectively, meaning or will not produce a match when found in "ore", "word" or "door".
  • Regex - Match anywhere the regular expression pattern is found. This includes matches within distinct words or numbers, e.g. a regular expression of r[aeiou]+ will match the "re" in "regular", "expression" and "anywhere".
    • Word Search does not support regular expression assertions. Regular expression containing ^ or $ will return matches found anywhere in the file and regular expressions containing a lookaround will not return any matches.

For every pattern matched, the following actions (textSetting) can be taken:

  • Allow - Produce an XML analysis report specifying the number of matching strings within the file and their location
  • Disallow - Report all matches and do not regenerate the input file if any are found
  • Redact - Report matches and regenerate the input file with all instances replaced with a character specified in the policy file with replacementChar. This action is only available for Microsoft Binary Office and Office Open XML files.
  • Require - Report all matches and do not regenerate the input file unless at least one match is found. This action is only available for plain text files, and at least one must be specified.

The APIs for Word Search support string, character based, and regular expression matching. See Word Search Library for Word Search API documentation.