Word Search & Redaction
    • PDF

    Word Search & Redaction

    • PDF

    Article summary

    The Glasswall Embedded Engine provides a pattern matching capability in the following file formats:

    • Microsoft Binary Office
    • Office Open XML
    • ASCII and UTF-8 plain text *(when enable_text_support is specified true under sysConfig)*

    The search strings are configured via a policy file, where they can be specified as either a text item or a regex item: 

    • Text - Match only distinct words or numbers. Words and numbers are considered distinct if the character immediately preceding or succeeding the match is not a letter or digit respectively, meaning `or` will not produce a match when found in "ore", "word" or "door".
    • Regex - Match anywhere the regular expression pattern is found. This includes matches within distinct words or numbers. For example, a regular expression of r[aeiou]+ will match the "re" in "regular", "expression" and "anywhere". 
      • The 'Word Search' feature does not support regular expression assertions.
        • Avoid using ^ or $ in regular expressions, as these anchors may not work as expected for matching the start or end of lines within the file.
        • Regular expressions using lookaround assertions (lookahead/lookbehind) will not return any matches.
      • Regular expression matching is case insensitive by default in the current implementation.
      • The regular expression matching engine uses boost/regex, which is based on Perl-style regular expressions.

    Precedence - Text rules will be evaluated in the order that they are encountered in the policy file, and always before the RegEx rules, which again will be evaluated in order that they are recorded

    For every pattern matched, the following actions (textSetting) can be taken:

    • Allow - Produce an XML analysis report specifying the number of matching strings within the file and their location
    • Disallow - Report all matches and do not regenerate the input file if any are found
    • Redact - Report matches and regenerate the input file with all instances replaced with a character specified in the policy file with `replacementChar`. *This action is only available for Microsoft Binary Office and Office Open XML files.*
    • Require - Report all matches and do not regenerate the input file unless at least one match is found. *This action is only available for plain text files, and at least one must be specified.*

    The APIs for Word Search support string, character based and regular expression matching. A full list of the Word Search API functions can be found in Word Search Library.

    Learn more




    Was this article helpful?

    What's Next