Word Search Library
    • PDF

    Word Search Library

    • PDF

    Article Summary

    The Glasswall engine provides deep-file inspection, remediation, sanitisation, and reporting. The engine deconstructs a file to its structural components and builds an internal tree-like representation of the file.

    It walks each node of the tree, inspecting, repairing, and sanitising content items before reconstructing a new file.

    The Glasswall engine also provides the ability to export and import the engine's internal representation of a file structure in an intermediate format such as XML. This allows internal components of a file to be made available to external programs for additional processing, before recomposing the file to include those externally modified components.

    The Glasswall Word Search engine is built on top of the export and import capability, performing text searching in the content and metadata of a file. Search strings, content management, and redaction rules are configured via an XML file.

    A user-configurable character substitution map defined in JSON form is used to provide support for text obfuscation.

    The engine also comes with built-in regular expression support.

    Word Search Configuration

    The Word Search configuration specifies the text to search for, or the reqular expression to be applied and how it should be treated when found within the document. The Word Search configuration is an extension to the Glasswall content management.

    Examples

    Example Word Search policy files and the homoglyph dictionary can be found in the 'config files/sdkwordsearch' folder of the release package. The Word Search XSD can be found in the schemas folder of the release package.

    Example Configuration Policy

    The following sections showcases different textSetting that can be defined in a configuration policy. For more information on the different settings, refer to Word Search & Redaction.

    Allow

    <textSearchConfig libVersion="core2">
    	<textList>
    		<textItem>
                <regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
                <textSetting>allow</textSetting>
            </textItem>
            <textItem>
                <text>Glasswall</text>
                <textSetting>allow</textSetting>
            </textItem>
    	</textList>
    </textSearchConfig>
    

    Disallow

    <textSearchConfig libVersion="core2">
        <textList>
            <textItem>
                <regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
                <textSetting>disallow</textSetting>
            </textItem>
            <textItem>
                <text>Glasswall</text>
                <textSetting>disallow</textSetting>
            </textItem>
        </textList>
    </textSearchConfig>
    

    Redact

    <textSearchConfig libVersion="core2">
        <textList>
            <textItem>
                <regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
                <textSetting replacementChar="*">redact</textSetting>
            </textItem>
            <textItem>
                <text>Glasswall</text>
                <textSetting replacementChar="*">redact</textSetting>
            </textItem>
        </textList>
    </textSearchConfig>
    

    Require

    <textSearchConfig libVersion="core2">
        <textList>
            <textItem>
                <regex>((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}</regex>
                <textSetting>require</textSetting>
            </textItem>
        </textList>
    </textSearchConfig>
    

    Known Limitations

    • Processing both Office and text files at the same time is not possible
    • When processing text files, at least one require policy needs to be defined
    • Configuration policy that has a combination of the following textSettings with the same text/regex defined will always process the file:
      • require and redact
      • require and disallow
      • redact and allow
      • allow and disallow

    Example JSON Homoglyph Config

    The JSON file allows the user to create a mapping between characters and corresponding homoglyphs. This allows the engine to consider homoglyphs when generating search expressions, enabling support for homographs (look-alike words) and obfuscated text.

    Example

    {
    	"!": "ǃⵑ",
    	"$": "$",
    	"%": "%",
    	"&": "ꝸ&",
    	"'": "`´ʹʻʼʽʾˈˊˋ˴ʹ΄՚՝י׳ߴߵᑊᛌ᾽᾿`´῾‘’‛′‵ꞌ'`𖽑𖽒",
    	"(": "❨❲〔﴾([",
    	")": "❩❳〕﴿)]",
    	"*": "٭⁎∗*𐌟",
    	"+": "᛭+𐊛",
    	",": "¸؍٫‚ꓹ,",
    	"-": "˗۔‐‑‒–⁃−➖Ⲻ﹘",
    	".": "٠۰܁܂․ꓸ꘎.𐩐𝅭",
    	"/": "᜵⁁⁄∕╱⟋⧸Ⳇ⼃〳ノ㇓丿/𝈺",
    	"0": "OoΟοσОоՕօסه٥ھہە۵߀०০੦૦ଠ୦௦ం౦ಂ೦ംഠ൦ං๐໐ဝ၀ჿዐᴏᴑℴⲞⲟⵔ〇ꓳꬽﮦﮧﮨﮩﮪﮫﮬﮭﻩﻪﻫﻬ0Oo𐊒𐊫𐐄𐐬𐓂𐓪𐔖𑓐𑢵𑣈𑣗𑣠𝐎𝐨𝑂𝑜𝑶𝒐𝒪𝓞𝓸𝔒𝔬𝕆𝕠𝕺𝖔𝖮𝗈𝗢𝗼𝘖𝘰𝙊𝙤𝙾𝚘𝚶𝛐𝛔𝛰𝜊𝜎𝜪𝝄𝝈𝝤𝝾𝞂𝞞𝞸𝞼𝟎𝟘𝟢𝟬𝟶𞸤𞹤𞺄",
    	"1": "Il|ƖǀΙІӀ׀וןا١۱ߊᛁℐℑℓⅠⅼ∣⏽Ⲓⵏꓲﺍﺎ1Il│𐊊𐌉𐌠𖼨𝐈𝐥𝐼𝑙𝑰𝒍𝓁𝓘𝓵𝔩𝕀𝕝𝕴𝖑𝖨𝗅𝗜𝗹𝘐𝘭𝙄𝙡𝙸𝚕𝚰𝛪𝜤𝝞𝞘𝟏𝟙𝟣𝟭𝟷𞣇𞸀𞺀",
    	"2": "ƧϨᒿꙄꛯꝚ2𝟐𝟚𝟤𝟮𝟸",
    	"3": "ƷȜЗӠⳌꝪꞫ3𑣊𖼻𝈆𝟑𝟛𝟥𝟯𝟹",
    	"4": "Ꮞ4𑢯𝟒𝟜𝟦𝟰𝟺",
    	"5": "Ƽ5𑢻𝟓𝟝𝟧𝟱𝟻",
    	"6": "бᏮⳒ6𑣕𝟔𝟞𝟨𝟲𝟼",
    	"7": "7𐓒𑣆𝈒𝟕𝟟𝟩𝟳𝟽",
    	"8": "Ȣȣ৪੪ଃ8𐌚𝟖𝟠𝟪𝟴𝟾𞣋",
    	"9": "৭੧୨൭ⳊꝮ9𑢬𑣌𑣖𝟗𝟡𝟫𝟵𝟿",
    	"A": "4ΑАᎪᗅᴀꓮꭺA𐊠𖽀𝐀𝐴𝑨𝒜𝓐𝔄𝔸𝕬𝖠𝗔𝘈𝘼𝙰𝚨𝛢𝜜𝝖𝞐",
    	"B": "ʙΒВвᏴᏼᗷᛒℬꓐꞴB𐊂𐊡𐌁𝐁𝐵𝑩𝓑𝔅𝔹𝕭𝖡𝗕𝘉𝘽𝙱𝚩𝛣𝜝𝝗𝞑",
    	"C": "ϹСᏟℂℭⅭⲤꓚC𐊢𐌂𐐕𐔜𑣩𑣲𝐂𝐶𝑪𝒞𝓒𝕮𝖢𝗖𝘊𝘾𝙲🝌",
    	"D": "ᎠᗞᗪᴅⅅⅮꓓꭰD𝐃𝐷𝑫𝒟𝓓𝔇𝔻𝕯𝖣𝗗𝘋𝘿𝙳",
    	"E": "ΕЕᎬᴇℰ⋿ⴹꓰꭼE𐊆𑢦𑢮𝐄𝐸𝑬𝓔𝔈𝔼𝕰𝖤𝗘𝘌𝙀𝙴𝚬𝛦𝜠𝝚𝞔",
    	"F": "ϜᖴℱꓝꞘF𐊇𐊥𐔥𑢢𑣂𝈓𝐅𝐹𝑭𝓕𝔉𝔽𝕱𝖥𝗙𝘍𝙁𝙵𝟊",
    	"G": "ɢԌԍᏀᏳᏻꓖꮐG𝐆𝐺𝑮𝒢𝓖𝔊𝔾𝕲𝖦𝗚𝘎𝙂𝙶",
    	"H": "ʜΗНнᎻᕼℋℌℍⲎꓧꮋH𐋏𝐇𝐻𝑯𝓗𝕳𝖧𝗛𝘏𝙃𝙷𝚮𝛨𝜢𝝜𝞖",
    	"I": "",
    	"J": "ͿЈᎫᒍᴊꓙꞲꭻJ𝐉𝐽𝑱𝒥𝓙𝔍𝕁𝕵𝖩𝗝𝘑𝙅𝙹",
    	"K": "ΚКᏦᛕKⲔꓗK𐔘𝐊𝐾𝑲𝒦𝓚𝔎𝕂𝕶𝖪𝗞𝘒𝙆𝙺𝚱𝛫𝜥𝝟𝞙",
    	"L": "ʟᏞᒪℒⅬⳐⳑꓡꮮL𐐛𐑃𐔦𑢣𑢲𖼖𝈪𝐋𝐿𝑳𝓛𝔏𝕃𝕷𝖫𝗟𝘓𝙇𝙻",
    	"M": "ΜϺМᎷᗰᛖℳⅯⲘꓟM𐊰𐌑𝐌𝑀𝑴𝓜𝔐𝕄𝕸𝖬𝗠𝘔𝙈𝙼𝚳𝛭𝜧𝝡𝞛",
    	"N": "ɴΝℕⲚꓠN𐔓𝐍𝑁𝑵𝒩𝓝𝔑𝕹𝖭𝗡𝘕𝙉𝙽𝚴𝛮𝜨𝝢𝞜",
    	"O": "0",
    	"P": "ΡРᏢᑭᴘᴩℙⲢꓑꮲP𐊕𝐏𝑃𝑷𝒫𝓟𝔓𝕻𝖯𝗣𝘗𝙋𝙿𝚸𝛲𝜬𝝦𝞠",
    	"Q": "ℚⵕQ𝐐𝑄𝑸𝒬𝓠𝔔𝕼𝖰𝗤𝘘𝙌𝚀",
    	"R": "ƦʀᎡᏒᖇᚱℛℜℝꓣꭱꮢR𐒴𖼵𝈖𝐑𝑅𝑹𝓡𝕽𝖱𝗥𝘙𝙍𝚁",
    	"S": "$ЅՏᏕᏚꓢS𐊖𐐠𖼺𝐒𝑆𝑺𝒮𝓢𝔖𝕊𝕾𝖲𝗦𝘚𝙎𝚂",
    	"T": "ŤΤτТтᎢᴛ⊤⟙ⲦꓔꭲT𐊗𐊱𐌕𑢼𖼊𝐓𝑇𝑻𝒯𝓣𝔗𝕋𝕿𝖳𝗧𝘛𝙏𝚃𝚻𝛕𝛵𝜏𝜯𝝉𝝩𝞃𝞣𝞽🝨",
    	"U": "Սሀᑌ∪⋃ꓴU𐓎𑢸𖽂𝐔𝑈𝑼𝒰𝓤𝔘𝕌𝖀𝖴𝗨𝘜𝙐𝚄",
    	"V": "Ѵ٧۷ᏙᐯⅤⴸꓦꛟV𐔝𑢠𖼈𝈍𝐕𝑉𝑽𝒱𝓥𝔙𝕍𝖁𝖵𝗩𝘝𝙑𝚅",
    	"W": "ԜᎳᏔꓪW𑣦𑣯𝐖𝑊𝑾𝒲𝓦𝔚𝕎𝖂𝖶𝗪𝘞𝙒𝚆",
    	"X": "ΧХ᙭ᚷⅩ╳ⲬⵝꓫꞳX𐊐𐊴𐌗𐌢𐔧𑣬𝐗𝑋𝑿𝒳𝓧𝔛𝕏𝖃𝖷𝗫𝘟𝙓𝚇𝚾𝛸𝜲𝝬𝞦",
    	"Y": "ΥϒУҮᎩᎽⲨꓬY𐊲𑢤𖽃𝐘𝑌𝒀𝒴𝓨𝔜𝕐𝖄𝖸𝗬𝘠𝙔𝚈𝚼𝛶𝜰𝝪𝞤",
    	"Z": "ΖᏃℤℨꓜZ𐋵𑢩𑣥𝐙𝑍𝒁𝒵𝓩𝖅𝖹𝗭𝘡𝙕𝚉𝚭𝛧𝜡𝝛𝞕",
    	"a": "@ɑαа⍺a𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊𝛂𝛼𝜶𝝰𝞪",
    	"b": "ƄЬᏏᖯb𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋",
    	"c": "ϲсᴄⅽⲥꮯc𐐽𝐜𝑐𝒄𝒸𝓬𝔠𝕔𝖈𝖼𝗰𝘤𝙘𝚌",
    	"d": "ԁᏧᑯⅆⅾꓒd𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍",
    	"e": "еҽ℮ℯⅇꬲe𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎",
    	"f": "ſϝքẝꞙꬵf𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏𝟋",
    	"g": "ƍɡցᶃℊg𝐠𝑔𝒈𝓰𝔤𝕘𝖌𝗀𝗴𝘨𝙜𝚐",
    	"h": "һհᏂℎh𝐡𝒉𝒽𝓱𝔥𝕙𝖍𝗁𝗵𝘩𝙝𝚑",
    	"i": "ıɩɪ˛ͺιіӏᎥιℹⅈⅰ⍳ꙇꭵi𑣃𝐢𝑖𝒊𝒾𝓲𝔦𝕚𝖎𝗂𝗶𝘪𝙞𝚒𝚤𝛊𝜄𝜾𝝸𝞲",
    	"j": "ϳјⅉj𝐣𝑗𝒋𝒿𝓳𝔧𝕛𝖏𝗃𝗷𝘫𝙟𝚓",
    	"k": "k𝐤𝑘𝒌𝓀𝓴𝔨𝕜𝖐𝗄𝗸𝘬𝙠𝚔",
    	"l": "1",
    	"m": "m",
    	"n": "ոռn𝐧𝑛𝒏𝓃𝓷𝔫𝕟𝖓𝗇𝗻𝘯𝙣𝚗",
    	"o": "",
    	"p": "ρϱр⍴ⲣp𝐩𝑝𝒑𝓅𝓹𝔭𝕡𝖕𝗉𝗽𝘱𝙥𝚙𝛒𝛠𝜌𝜚𝝆𝝔𝞀𝞎𝞺𝟈",
    	"q": "ԛգզq𝐪𝑞𝒒𝓆𝓺𝔮𝕢𝖖𝗊𝗾𝘲𝙦𝚚",
    	"r": "гᴦⲅꭇꭈꮁr𝐫𝑟𝒓𝓇𝓻𝔯𝕣𝖗𝗋𝗿𝘳𝙧𝚛",
    	"s": "$ƽѕꜱꮪs𐑈𑣁𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜",
    	"t": "t𝐭𝑡𝒕𝓉𝓽𝔱𝕥𝖙𝗍𝘁𝘵𝙩𝚝",
    	"u": "ʋυսᴜꞟꭎꭒu𐓶𑣘𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞𝛖𝜐𝝊𝞄𝞾",
    	"v": "νѵטᴠⅴ∨⋁ꮩv𑜆𑣀𝐯𝑣𝒗𝓋𝓿𝔳𝕧𝖛𝗏𝘃𝘷𝙫𝚟𝛎𝜈𝝂𝝼𝞶",
    	"w": "ɯѡԝաᴡꮃw𑜊𑜎𑜏𝐰𝑤𝒘𝓌𝔀𝔴𝕨𝖜𝗐𝘄𝘸𝙬𝚠",
    	"x": "×хᕁᕽ᙮ⅹ⤫⤬⨯x𝐱𝑥𝒙𝓍𝔁𝔵𝕩𝖝𝗑𝘅𝘹𝙭𝚡",
    	"y": "ɣʏγуүყᶌỿℽꭚy𑣜𝐲𝑦𝒚𝓎𝔂𝔶𝕪𝖞𝗒𝘆𝘺𝙮𝚢𝛄𝛾𝜸𝝲𝞬",
    	"z": "ᴢꮓz𑣄𝐳𝑧𝒛𝓏𝔃𝔷𝕫𝖟𝗓𝘇𝘻𝙯𝚣",
    	"£": "₤",
    	"©": "Ⓒ",
    	"®": "Ⓡ"
    }
    

    Example Analysis Report

    Here is an example analysis report that is generated when the search string is set to 'Glasswall', irrespective of the textSetting used. This includes an ItemMatchCount for every pattern that is matched in a given file.

    <gw:WordItem>
        <gw:Name>Glasswall</gw:Name>
        <gw:ItemMatchCount>1</gw:ItemMatchCount>
        <gw:Locations>
            <gw:Location>
                <gw:Offset>463</gw:Offset>
                <gw:Page>0</gw:Page>
                <gw:Paragraph>0</gw:Paragraph>
            </gw:Location>
        </gw:Locations>
    </gw:WordItem>
    

    API Functions

    Return Types

    C++

    Each of the APIs returns a Status type, which is defined as follows:

    enum Status {
        // Return codes -1 to -1023 reserved for sdk.editor
        ws_disallowedItemFound = -1024,       // Item disallowed by policy found in file
        ws_requiredItemNotFound = -1025,      // Item required by policy not found in file
        ws_illegalActionRedact = -1026,       // Redact action specified but filetype doesn't support redaction
        ws_illegalActionRequire = -1027,      // Require action specified but filetype doesn't support redaction
        ws_illegalActionNoRequire = -1028,    // Require action not specified but filetype needs one
        ws_filetypeUnsupported = -1029,       // Filetype not supported by Word Search
    
        // General return codes. Used when none of above apply, or when only bool is needed
        eFail = 0,                            // Differs from gw2ret_generalfail in sdk.editor, but preserved for backwards compatability.
        eSuccess = 1,                         // Differs from gw2ret_ok in sdk.editor, but preserved for backwards compatability.
    };
    

    C#

    To integrate Glasswall Word Search in C# the Glasswall Word Search C# wrapper is required.

    Each of the APIs returns a WordSearchStatus type, which is defined as follows:

    /// <summary>
    ///     Indicates whether the Word Search process was successful (WordSearchStatus.Success)
    ///     or not (WordSearchStatus.Fail). Zero or negative values indicate a failure.
    /// </summary>
    public enum WordSearchStatus
    {
        DisallowedItemFound = -1024,       // Item disallowed by policy found in file
        RequiredItemNotFound = -1025,      // Item required by policy not found in file
        IllegalActionRedact = -1026,       // Redact action specified but filetype doesn't support redaction
        IllegalActionRequire = -1027,      // Require action specified but filetype doesn't support require
        IllegalActionNoRequire = -1028,    // Require action not specified but filetype needs one
        FiletypeUnsupported = -1029,       // Filetype not supported by Word Search
    
        Fail = 0,
        Success
    }
    

    Java

    To integrate Glasswall Word Search in java the Glasswall Word Search Java wrapper is required.

    Each of the APIs returns a GlasswallWordSearchResult type, which is defined as follows:

    package com.glasswallsolutions;
    
    /**
     * Class used to hold the results from a word search process.
     */
    public class GlasswallWordSearchResult
    {
        /**
         * The XML analysis report
         */
        public String report;
    
        /**
         * The processed document
         */
        public byte[] outputDocument;
    
        /**
         * boolean indicating whether the process was successful (true) or not (false)
         */
        public boolean success;
    
        public GlasswallWordSearchResult()
        {
            report = null;
            outputDocument = null;
            success = false;
        }
    }
    

    Python

    To integrate Glasswall Word Search in Python the Glasswall Word Search Python wrapper is required.

    Each of the APIs returns a WordSearchStatus type, which is defined as follows:

    # glasswall\libraries\word_search\successes.py
    
    class Success(WordSearchSuccess):
        """ WordSearch success code 1. """
        pass
    
    
    success_codes = {
        1: Success,
    }
    
    # glasswall\libraries\word_search\errors.py
    
    class UnknownErrorCode(WordSearchError):
        """ Unknown error code. """
        pass
    
    class Fail(WordSearchError):
        """ WordSearch error code 0. """
        pass
    
    
    class DisallowedItemFound(WordSearchError):
        """ WordSearch error code -1024. Item disallowed by policy found in file. """
        pass
    
    
    class RequiredItemNotFound(WordSearchError):
        """ WordSearch error code -1025. Item required by policy not found in file. """
        pass
    
    
    class IllegalActionRedact(WordSearchError):
        """ WordSearch error code -1026. Redact action specified but filetype doesn't support redaction. """
        pass
    
    
    class IllegalActionRequire(WordSearchError):
        """ WordSearch error code -1027. Require action specified but filetype doesn't support redaction. """
        pass
    
    
    class IllegalActionNoRequire(WordSearchError):
        """ WordSearch error code -1028. Require action not specified but filetype needs one. """
        pass
    
    
    class FiletypeUnsupported(WordSearchError):
        """ WordSearch error code -1029. Filetype supported by Editor but not by Word Search. """
        pass
    
    
    error_codes = {
        0: Fail,
        -1024: DisallowedItemFound,
        -1025: RequiredItemNotFound,
        -1026: IllegalActionRedact,
        -1027: IllegalActionRequire,
        -1028: IllegalActionNoRequire,
        -1029: FiletypeUnsupported,
    }
    

    Javascript

    To integrate Glasswall Word Search in JavaScript the Glasswall Word Search JavaScript wrapper is required.

    Each of the APIs returns a word_search_status type, which is defined as follows:

    
    /**
     * Used to indicate whether the word search process was successful (word_search_status.Success)
     * or not (word_search_status.Fail)
     */
    const word_search_status = {
        Fail: "Fail",
        Success: "Success"
    }
    
    

    GwWordSearch

    This is used to call the word search engine, process the specified input file and produce an output file along with a word search report.

    Status GwWordSearch(
        void* input_buffer,
        size_t input_buffer_len,
        void** output_buffer,
        size_t* output_buffer_len,
        void** output_report_buffer,
        size_t* output_report_buffer_len,
        const char* homoglpyhs,
        const char* xml_config_string
    )
    
    NameTypeDirectionDescription
    input_buffervoid *InA pointer to the buffer containing the input file to be processed
    input_buffer_lensize_tInThe size of the input file buffer
    output_buffervoid **OutA pointer to a pointer to a buffer that will be populated with the processed file buffer. This buffer is allocated by the word search engine
    output_buffer_lensize_t *OutA pointer to the size of the output file buffer. This will be set by the word search engine
    output_report_buffervoid **OutA pointer to a pointer to a buffer that will be populated with the word search report buffer. This buffer is allocated by the word search engine
    output_report_buffer_lensize_t *OutA pointer to the size of the word search report. This will be set by the word search engine
    homoglyphsconst char *InA pointer to the buffer containing the homoglyphs file. This buffer needs to be null terminated
    xml_config_stringconst char *InA pointer to the buffer containing the content management XML file. This buffer needs to be null terminated

    To integrate Glasswall Word Search in C# the Glasswall Word Search C# wrapper is required.

    public WordSearchStatus GwWordSearch(
        byte[] inputBuffer,
        out byte[] outputFileBuffer,
        out String outputAnalysisReport,
        string homoglyphs,
        string xmlConfigString
    )
    
    
    NameTypeDirectionDescription
    inputBufferbyte[]InThe buffer containing the document to be processed
    outputFileBufferout byte[]OutThe resulting buffer that will contain the processed document
    outputAnalysisReportout stringOutThe output analysis report from the word search process
    homoglyphsstringInA JSON document containing the homoglyph mappings
    xmlConfigStringstringInThe XML content management policy

    To integrate Glasswall Word Search in Java the Glasswall Word Search Java wrapper is required.

    
    public native GlasswallWordSearchResult wordSearch(
        byte[] inputDocument,
        String homoglyphs,
        String xmlConfig
    )
    
    
    NameTypeDirectionDescription
    inputDocumentbyte[]InThe buffer containing the document to be processed
    homoglyphsstringInA JSON document containing the homoglyph mappings
    xmlConfigstringInThe XML content management policy

    Note: Unlike some other supported languages, all output is returned in the GlasswallWordSearchResult object for java.

    To integrate Glasswall Word Search in Python the Glasswall Python wrapper is required. Each of the APIs returns a generic GwReturnObj object, which will contain the attributes: "status" (int), "output_file" (bytes), "output_report" (bytes). The int statuses are defined as follows:

    
    def word_search(self,
        input_buffer: bytearray,
        homoglyphs: str,
        config_xml: str):
    
        """
        Calls the GwWordSearch API
    
        Parameters
        ----------
            input_buffer : bytearray
                The buffer containing the file to be processed
            homoglyphs : str
                The JSON string containing the homoglyph mappings
            config_xml : str
                The XML content management configuration
    
        Returns
        -------
            Returns a WordSearchResult that contains the processed document, the analysis report and the process status.
        """
    
    

    Note: Unlike some other supported languages, all output is returned in the WordSearchResult object for Python.

    
    /**
     *
     * @param {Buffer} input_buffer A buffer containing the contents of the document to be processed.
     * @param {String} homoglyphs A homoglyphs file that will be used as part of the word search process.
     * @param {String} config_xml The content management XML policy.
     * @returns {word_search_result} The result from word search.
     */
    word_search(input_buffer,
        homoglyphs,
        config_xml
    )
    
    

    Note: Unlike some other supported languages, all output is returned in the word_search_result object for JavaScript.

    GWWordSearchDone

    This is used to release any resources that have been allocated by the word search engine. This function needs to be called after each call made to the GwWordSearch function otherwise memory leaks will occur.

    This API call is only required in C++.

    Status GwWordSearchDone(
        void** output_buffer,
        size_t* output_buffer_len,
        void** output_report_buffer,
        size_t* output_report_buffer_len)
    
    NameTypeDirectionDescription
    output_buffervoid **OutA pointer to a pointer to the buffer containing the processed file that will be freed by the word search library
    output_buffer_lensize_t *OutA pointer to the size of the output file buffer
    output_report_buffervoid **OutA pointer to a pointer to the buffer containing the word search report that will be freed by the word search library
    output_report_buffer_lensize_t *OutA pointer to the size of the word search report

    For all languages covered by the Glasswall wrappers, the GwWordSearchDone API function is internally called within the wrapper, meaning the API is not exposed to the user.

    Common Issues

    Word search is not processing files

    If word search is not processing files correctly then there may be a few reasons that can cause this.

    When running word search please ensure that all the Glasswall libraries are located in the same directory, which also needs to be set as the current working directory. Glasswall searches within the current working directory for it's dependencies and if they are not found then files will not be processed correctly.

    Example Usage

    Here we have an example application that takes an input file, processes it using the Glasswall Word Search engine, and then produces an output file alongside a word search analysis report.

    This example application expects the following
    commandline parameters:

    1. Path to the content management configuration XML.
    2. Path to the homoglyphs file.
    3. Path to the input file to be processed.
    4. Path to the output file where the processed file will be stored.

    C++

    #include <iostream>
    #include <fstream>
    #include <string>
    #include <vector>
    #include <cstddef>
    #include <stdexcept>
    
    #include "api.h"
    
    using namespace std;
    
    // Read the file into a buffer
    vector<uint8_t> readFile(ifstream &fileHandle, const string &filePath, bool nullTerminator)
    {
        fileHandle.exceptions(ifstream::failbit | ifstream::badbit);
        fileHandle.open(filePath.c_str(), ios::binary | ios::ate);
    
        vector<uint8_t> data;
        streamsize size = fileHandle.tellg();
        fileHandle.seekg(0, ios::beg);
    
        data.resize(size + 1);
        fileHandle.read(reinterpret_cast<char *>(data.data()), size);
    
        if (nullTerminator)
        {
            data.push_back(0);
        }
    
        return data;
    }
    
    int main(int argc, char **argv)
    {
        if (argc != 5)
        {
            cerr << "Usage: <Path to XML Config> <Path to Homoglyphs> <Input file> <Output file>" << endl;
            return -1;
        }
    
        // Read commandline arguments
        string xmlFilePath(argv[1]);
        string homoglyphsFilePath(argv[2]);
        string inputFilePath(argv[3]);
        string outputFilePath(argv[4]);
    
        // Create file handles for input files
        ifstream xmlFileHandle;
        ifstream homoglyphsFileHandle;
        ifstream inputFileHandle;
    
        // Read files into buffers
        vector<uint8_t> xmlBuffer = readFile(xmlFileHandle, xmlFilePath, true);                         // Buffer containing the XML content management settings. This is null terminated
        vector<uint8_t> homoglyphsBuffer = readFile(homoglyphsFileHandle, homoglyphsFilePath, true);    // Buffer containing the homoglyphs. This is null terminated
        vector<uint8_t> inputBuffer = readFile(inputFileHandle, inputFilePath, false);                  // Buffer containing the input file to be processed
    
        // Create variables for output buffers
        void * outputBuffer = nullptr;          // Output buffer for processed file
        size_t outputBufferSize = 0;            // Output buffer size
        void * outputReportBuffer = nullptr;    // Output buffer for report file
        size_t outputReportBufferSize = 0;      // Output report buffer size
    
        // Run word search and redact
        Status status = GwWordSearch(inputBuffer.data(), inputBuffer.size(), &outputBuffer, &outputBufferSize, &outputReportBuffer, &outputReportBufferSize, reinterpret_cast<const char*>(homoglyphsBuffer.data()), reinterpret_cast<const char *>(xmlBuffer.data()));
    
        if (status == Status::eSuccess)
        {
            // Write out the processed output file if the word search and redact was successful
            ofstream outputFileHandle(outputFilePath, ios::binary | ios::trunc);
    
            if (outputFileHandle.is_open())
            {
                outputFileHandle.write(static_cast<const char *>(outputBuffer), outputBufferSize);
            }
    
            outputFileHandle.close();
        }
    
        // Write out the report file
        ofstream analysisFileHandle(outputFilePath + ".xml", ios::binary | ios::trunc);
    
        if (analysisFileHandle.is_open())
        {
            analysisFileHandle.write(static_cast<const char *>(outputReportBuffer), outputReportBufferSize);
        }
    
        analysisFileHandle.close();
    
        // Call done to release any allocated resources
        GwWordSearchDone(&outputBuffer, &outputBufferSize, &outputReportBuffer, &outputReportBufferSize);
    
        return 0;
    }
    

    C#

    
    using System;
    using System.IO;
    
    namespace glasswall.word.search.csharp.testing
    {
        internal class Program
        {
            static void Main(string[] args)
            {
                Console.WriteLine("Word search test");
                if (args.Length != 4)
                {
                    Console.WriteLine("usage: <Xml Config> <Homoglyphs> <Input Directory> <OutputDirectory>");
                    Console.WriteLine("Parameters specified: \n{0}", string.Join("\n", args));
                    return;
                }
    
                string xmlConfigPath = args[0];
                string homoglyphsPath = args[1];
                string inputDirectory = args[2];
                string outputDirectory = args[3];
    
                if (!File.Exists(xmlConfigPath))
                {
                    Console.Error.WriteLine("Xml config does not exist: {0}", xmlConfigPath);
                    return;
                }
    
                if (!File.Exists(homoglyphsPath))
                {
                    Console.Error.WriteLine("Homoglyphs does not exist: {0}", homoglyphsPath);
                    return;
                }
    
                if (!Directory.Exists(inputDirectory))
                {
                    Console.Error.WriteLine("Input directory does not exist: {0}", inputDirectory);
                    return;
                }
    
                Directory.CreateDirectory(outputDirectory);
    
                using (FileStream fileStream = new FileStream(Path.Combine(outputDirectory, "ProcessLog.txt"), FileMode.OpenOrCreate, FileAccess.Write))
                {
                    using (StreamWriter writer = new StreamWriter(fileStream))
                    {
                        writer.WriteLine("> Word Search Library version: {0}", GlasswallWordSearch.GwWordSearchVersion());
    
                        string xmlConfig = File.ReadAllText(xmlConfigPath);
                        string homoglyphs = File.ReadAllText(homoglyphsPath);
    
                        foreach (string path in Directory.EnumerateFiles(inputDirectory, "*", SearchOption.AllDirectories))
                        {
                            writer.WriteLine("> Processing file: {0}", path);
                            string inputDirectoryPath = path.Substring(inputDirectory.Length + 1);
                            string directory = Path.Combine(outputDirectory, inputDirectoryPath);
                            Directory.CreateDirectory(directory);
                            processFile(path, directory, homoglyphs, xmlConfig);
                        }
                    }
                }
    
                return;
            }
            static void WriteAllBytes(string path, byte[] data)
            {
                if (data == null)
                {
                    File.Create(path);
                }
                else
                {
                    File.WriteAllBytes(path, data);
                }
            }
            public static void processFile(string inputFile, string outputDirectory, string homoglyphs, string xmlConfig)
            {
    
                using (FileStream fileStream = new FileStream(Path.Combine(outputDirectory, Path.GetFileName(inputFile) + ".log"), FileMode.OpenOrCreate, FileAccess.Write))
                {
                    using (StreamWriter writer = new StreamWriter(fileStream))
                    {
                        // Word Search
                        writer.WriteLine(">> Run Word Search");
                        byte[] inputFileBuffer = File.ReadAllBytes(inputFile);
                        byte[] outputBuffer, outputReportBuffer;
                        GlasswallWordSearch.WordSearchStatus status = GlasswallWordSearch.GwWordSearch(inputFileBuffer, out outputBuffer, out outputReportBuffer, homoglyphs, xmlConfig);
                        writer.WriteLine("Status is: {0}", status);
    
                        if (outputBuffer != null)
                        {
                            WriteAllBytes(Path.Combine(outputDirectory, Path.GetFileName(inputFile)), outputBuffer);
                        }
    
                        if (outputReportBuffer != null)
                        {
                            WriteAllBytes(Path.Combine(outputDirectory, Path.GetFileName(inputFile)) + ".xml", outputReportBuffer);
                        }
                    }
                }
            }
        }
    }
    
    

    Java

    
    package com.glasswallsolutions;
    
    import java.lang.System;
    import java.io.*;
    import com.glasswallsolutions.*;
    import java.nio.file.Paths;
    
    public class MainTest {
    
    	public static byte[] readAllBytes(InputStream inputStream) throws IOException
    	{
    		final int bufLen = 4 * 0x400; // 4KB
    		byte[] buf = new byte[bufLen];
    		int readLen;
    
    		try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
    			while ((readLen = inputStream.read(buf, 0, bufLen)) != -1)
    				outputStream.write(buf, 0, readLen);
    
    			return outputStream.toByteArray();
    		}
    	}
    
    	public static void main(String[] args) throws Exception {
    		if (args.length != 4)
    		{
    			System.out.println("Usage: <Input Directory> <Output Directory> <Homoglyphs File> <Config XML>");
    			System.exit(-1);
    		}
    
    		File inputDirectory = new File(args[0]);
    		File outputDirectory = new File(args[1]);
    		outputDirectory.delete();
    		outputDirectory.mkdir();
    		String homoglyphsFile = args[2];
    		String configXmlFile = args[3];
    
    		String homoglyphs = null;
    		String configXML = null;
    
    		GlasswallWordSearch glasswallWordSearch = new GlasswallWordSearch();
    
    		try(FileInputStream homoglyphsInputStream = new FileInputStream(homoglyphsFile))
    		{
    			homoglyphs = new String(readAllBytes(homoglyphsInputStream));
    		}
    
    		try(FileInputStream configXmlInputStream = new FileInputStream(configXmlFile))
    		{
    			configXML = new String(readAllBytes(configXmlInputStream));
    		}
    
    		System.out.println("Word search version: " + glasswallWordSearch.version());
    
    		for (File inputFile : inputDirectory.listFiles())
    		{
    			try
    			{
    				System.out.println("Processing file: " + inputFile.getAbsolutePath());
    
    				File fileOutputDirectory = new File(Paths.get(outputDirectory.getAbsolutePath(), inputFile.getName()).toString());
    				fileOutputDirectory.mkdir();
    				String fileOutputPath = Paths.get(fileOutputDirectory.getAbsolutePath(), inputFile.getName()).toString();
    
    				try(FileInputStream inputStream = new FileInputStream(inputFile))
    				{
    					byte[] fileData = readAllBytes(inputStream);
    
    					GlasswallWordSearchResult result = glasswallWordSearch.wordSearch(fileData, homoglyphs, configXML);
    
    					System.out.println("Status: " + result.success);
    
    					if (result.outputDocument != null)
    					{
    						try(FileOutputStream fileOutputStream = new FileOutputStream(fileOutputPath))
    						{
    							fileOutputStream.write(result.outputDocument);
    						}
    					}
    
    					if (result.report != null)
    					{
    						try(FileOutputStream fileOutputStream = new FileOutputStream(fileOutputPath + ".xml"))
    						{
    							fileOutputStream.write(result.report.getBytes());
    						}
    					}
    				}
    			}
    			catch(Exception ex)
    			{
    				System.err.println("Exception occurred: " + ex.getMessage());
    				ex.printStackTrace(System.err);
    
    			}
    		}
    	}
    }
    

    Python

    
    from stat import FILE_ATTRIBUTE_NORMAL
    import glasswall_word_search
    import os
    import sys
    import shutil
    
    if __name__ == "__main__":
        if len(sys.argv) != 6:
            print("Usage: <Libraries Directory> <Input Directory> <Output Directory> <Homoglyphs> <Config XML>")
            sys.exit(-1)
    
        libraries_directory = os.path.abspath(sys.argv[1])
        input_directory = os.path.abspath(sys.argv[2])
        output_directory = os.path.abspath(sys.argv[3])
        homoglyphs_file_path = os.path.abspath(sys.argv[4])
        config_xml_file_path = os.path.abspath(sys.argv[5])
    
        if not os.path.exists(libraries_directory):
            print("Libraries directory does not exist: {}".format(libraries_directory))
            sys.exit(-1)
    
        if not os.path.exists(input_directory):
            print("Input directory does not exist: {}".format(input_directory))
            sys.exit(-1)
    
        if not os.path.exists(homoglyphs_file_path):
            print("Homoglyphs file does not exist: {}".format(homoglyphs_file_path))
            sys.exit(-1)
    
        if not os.path.exists(config_xml_file_path):
            print("Config xml file does not exist: {}".format(config_xml_file_path))
            sys.exit(-1)
    
        shutil.rmtree(output_directory, ignore_errors=True)
        os.makedirs(output_directory, exist_ok=True)
    
        word_search = glasswall_word_search.GlasswallWordSearch(libraries_directory)
        homoglyphs_data = None
        config_xml_data = None
    
        with open(homoglyphs_file_path, encoding="utf-8") as homoglyphs_file:
            homoglyphs_data = homoglyphs_file.read()
    
        with open(config_xml_file_path, encoding="utf-8") as config_xml_file:
            config_xml_data = config_xml_file.read()
    
        for filename in os.listdir(input_directory):
            full_file_path = os.path.abspath(os.path.join(input_directory, filename))
            try:
                print("Processing file: {}".format(full_file_path))
    
                if os.path.isdir(full_file_path):
                    continue
    
                output_file_directory = os.path.join(output_directory, filename)
                os.mkdir(output_file_directory)
    
                with open(full_file_path, "rb") as input_file:
                    content = input_file.read()
    
                    result = word_search.word_search(content, homoglyphs_data, config_xml_data)
                    print("Status: {}".format(result.status))
    
                    if result.output_buffer is not None and len(result.output_buffer) != 0:
                        with open(os.path.join(output_file_directory, filename), "wb") as output_file:
                            output_file.write(result.output_buffer)
    
                    if result.xml_report is not None and len(result.xml_report) != 0:
                        with open(os.path.join(output_file_directory, filename + ".xml"), "wb") as analysis_output_file:
                            analysis_output_file.write(result.xml_report)
            except Exception as ex:
                print("Exception occured: {}".format(ex), )
    
    

    Javascript

    
    let fs = require('fs');
    let path = require('path');
    let process = require('process');
    
    let main = function()
    {
        const args = process.argv;
    
        if (args.length === 7)
        {
            let lib_directory = path.resolve(args[2]);
            let input_directory = path.resolve(args[3]);
            let output_directory = path.resolve(args[4]);
            let homoglyphs_path = path.resolve(args[5]);
            let config_xml_path = path.resolve(args[6]);
    
            if (process.platform === "win32")
            {
                process.env.PATH += ";" + lib_directory;
                process.env.QT_PLUGIN_PATH = ";" + lib_directory;
            }
            else
            {
                process.env.QT_PLUGIN_PATH = ":" + lib_directory;
            }
    
            let wrapper = require('./glasswall_word_search.js');
            let glasswall_word_search = new wrapper.glasswall_word_search()
            console.log("Glasswall word search version: " + glasswall_word_search.version())
    
            if (!fs.existsSync(input_directory))
            {
                console.log('Input Directory does not exist: ' + input_directory);
                process.exit(-1);
            }
    
            if (!fs.existsSync(homoglyphs_path))
            {
                console.log('Homoglyphs file does not exist: ' + homoglyphs_path);
                process.exit(-1);
            }
    
            if (!fs.existsSync(config_xml_path))
            {
                console.log('Config XML file does not exist: ' + config_xml_path);
                process.exit(-1);
            }
    
            let homoglyphs = fs.readFileSync(homoglyphs_path, 'utf8');
            let config_xml = fs.readFileSync(config_xml_path, 'utf8');
    
            fs.rmSync(output_directory, {force: true, recursive: true});
            fs.mkdirSync(output_directory, {recursive: true});
    
            fs.readdirSync(input_directory).forEach(file => {
                try
                {
                    full_file_path = path.join(input_directory, file);
    
                    if (fs.statSync(full_file_path).isFile())
                    {
                        console.log('Processing file: ' + full_file_path);
                        let output_file_directory = path.join(output_directory, file);
                        fs.mkdirSync(output_file_directory);
                        let input_buffer = fs.readFileSync(full_file_path);
                        let word_search_result = glasswall_word_search.word_search(input_buffer, homoglyphs, config_xml);
                        console.log("Status: " + word_search_result.status);
    
                        if (word_search_result.output_buffer != undefined && word_search_result.output_buffer != null)
                        {
                            fs.writeFileSync(path.join(output_file_directory, file), word_search_result.output_buffer);
                        }
    
                        if (word_search_result.analysis_xml_report != undefined && word_search_result.analysis_xml_report != null)
                        {
                            fs.writeFileSync(path.join(output_file_directory, file + ".xml"), word_search_result.analysis_xml_report);
                        }
                    }
                }
                catch(error)
                {
                    console.log("Exception occurred: " + error);
                    console.trace(error);
                }
    
            })
        }
        else
        {
            console.log("Usage: Application <Library Directory> <Input Directory> <Output Directory> <Homoglyphs File> <Config XML>");
            process.exit(-1);
        }
    }
    
    if (require.main === module){
        main();
    }
    
    

    Was this article helpful?

    What's Next