Word Search & Redaction
    • PDF

    Word Search & Redaction

    • PDF

    Article Summary

    The Glasswall Embedded Engine provides a pattern matching capability in the following file formats:

    • Microsoft Binary Office
    • Office Open XML
    • ASCII and UTF-8 plain text

    The search strings are configured via a policy file, where they can be specified as either a `text` item or a `regex` item:

    • Text - Match only distinct words or numbers. Words and numbers are considered distinct if the character immediately preceding or succeeding the match is not a letter or digit respectively, meaning `or` will not produce a match when found in "ore", "word" or "door".
    • Regex - Match anywhere the regular expression pattern is found. This includes matches within distinct words or numbers, e.g. a regular expression of `r[aeiou]+` will match the "re" in "regular", "expression" and "anywhere".
      • Word Search does not support regular expression assertions. Regular expression containing `^` or `$` will return matches found anywhere in the file and regular expressions containing a lookaround will not return any matches.

    For every pattern matched, the following actions (textSetting) can be taken:

    • Allow - Produce an XML analysis report specifying the number of matching strings within the file and their location
    • Disallow - Report all matches and do not regenerate the input file if any are found
    • Redact - Report matches and regenerate the input file with all instances replaced with a character specified in the policy file with `replacementChar`. *This action is only available for Microsoft Binary Office and Office Open XML files.*
    • Require - Report all matches and do not regenerate the input file unless at least one match is found. *This action is only available for plain text files, and at least one must be specified.*

    The APIs for Word Search support string, character based and regular expression matching. A full list of the Word Search API functions can be found in Word Search Library.

    Learn more



    Was this article helpful?