Glasswall Word Search can be used to redact text from files and generates an XML report on the redacted file's details. See [Word Search & Redaction](/embedded-engine/embedded-engine-word-search-and-redaction). This report includes details on the file size, the determined file type, the total number of text matches, and the location of each of the text matches.

Example report

```xml 13084 docx 8 ipsum 5 120 0 0 267 0 0 691 0 0 973 0 0 1034 0 0 lorem 3 114 0 0 244 0 0 1224 0 0 ```

A homoglyphs JSON file can be specified either as a file path or in memory as bytes, bytearray, or io.BytesIO. If this is not specified then the default will be used:

Default homoglyphs.json file

```json { "!": "ǃⵑ", "$": "＄", "%": "％", "&": "ꝸ＆", "'": "`´ʹʻʼʽʾˈˊˋ˴ʹ΄՚՝י׳ߴߵᑊᛌ᾽᾿`´῾‘’‛′‵ꞌ＇｀𖽑𖽒", "(": "❨❲〔﴾（［", ")": "❩❳〕﴿）］", "*": "٭⁎∗＊𐌟", "+": "᛭＋𐊛", ",": "¸؍٫‚ꓹ，", "-": "˗۔‐‑‒–⁃−➖Ⲻ﹘", ".": "٠۰܁܂․ꓸ꘎．𐩐𝅭", "/": "᜵⁁⁄∕╱⟋⧸Ⳇ⼃〳ノ㇓丿／𝈺", "0": "OoΟοσОоՕօסه٥ھہە۵߀०০੦૦ଠ୦௦ం౦ಂ೦ംഠ൦ං๐໐ဝ၀ჿዐᴏᴑℴⲞⲟⵔ〇ꓳꬽﮦﮧﮨﮩﮪﮫﮬﮭﻩﻪﻫﻬ０Ｏｏ𐊒𐊫𐐄𐐬𐓂𐓪𐔖𑓐𑢵𑣈𑣗𑣠𝐎𝐨𝑂𝑜𝑶𝒐𝒪𝓞𝓸𝔒𝔬𝕆𝕠𝕺𝖔𝖮𝗈𝗢𝗼𝘖𝘰𝙊𝙤𝙾𝚘𝚶𝛐𝛔𝛰𝜊𝜎𝜪𝝄𝝈𝝤𝝾𝞂𝞞𝞸𝞼𝟎𝟘𝟢𝟬𝟶𞸤𞹤𞺄", "1": "Il|ƖǀΙІӀ׀וןا١۱ߊᛁℐℑℓⅠⅼ∣⏽Ⲓⵏꓲﺍﺎ１Ｉｌ￨𐊊𐌉𐌠𖼨𝐈𝐥𝐼𝑙𝑰𝒍𝓁𝓘𝓵𝔩𝕀𝕝𝕴𝖑𝖨𝗅𝗜𝗹𝘐𝘭𝙄𝙡𝙸𝚕𝚰𝛪𝜤𝝞𝞘𝟏𝟙𝟣𝟭𝟷𞣇𞸀𞺀", "2": "ƧϨᒿꙄꛯꝚ２𝟐𝟚𝟤𝟮𝟸", "3": "ƷȜЗӠⳌꝪꞫ３𑣊𖼻𝈆𝟑𝟛𝟥𝟯𝟹", "4": "Ꮞ４𑢯𝟒𝟜𝟦𝟰𝟺", "5": "Ƽ５𑢻𝟓𝟝𝟧𝟱𝟻", "6": "бᏮⳒ６𑣕𝟔𝟞𝟨𝟲𝟼", "7": "７𐓒𑣆𝈒𝟕𝟟𝟩𝟳𝟽", "8": "Ȣȣ৪੪ଃ８𐌚𝟖𝟠𝟪𝟴𝟾𞣋", "9": "৭੧୨൭ⳊꝮ９𑢬𑣌𑣖𝟗𝟡𝟫𝟵𝟿", "A": "4ΑАᎪᗅᴀꓮꭺＡ𐊠𖽀𝐀𝐴𝑨𝒜𝓐𝔄𝔸𝕬𝖠𝗔𝘈𝘼𝙰𝚨𝛢𝜜𝝖𝞐", "B": "ʙΒВвᏴᏼᗷᛒℬꓐꞴＢ𐊂𐊡𐌁𝐁𝐵𝑩𝓑𝔅𝔹𝕭𝖡𝗕𝘉𝘽𝙱𝚩𝛣𝜝𝝗𝞑", "C": "ϹСᏟℂℭⅭⲤꓚＣ𐊢𐌂𐐕𐔜𑣩𑣲𝐂𝐶𝑪𝒞𝓒𝕮𝖢𝗖𝘊𝘾𝙲🝌", "D": "ᎠᗞᗪᴅⅅⅮꓓꭰＤ𝐃𝐷𝑫𝒟𝓓𝔇𝔻𝕯𝖣𝗗𝘋𝘿𝙳", "E": "ΕЕᎬᴇℰ⋿ⴹꓰꭼＥ𐊆𑢦𑢮𝐄𝐸𝑬𝓔𝔈𝔼𝕰𝖤𝗘𝘌𝙀𝙴𝚬𝛦𝜠𝝚𝞔", "F": "ϜᖴℱꓝꞘＦ𐊇𐊥𐔥𑢢𑣂𝈓𝐅𝐹𝑭𝓕𝔉𝔽𝕱𝖥𝗙𝘍𝙁𝙵𝟊", "G": "ɢԌԍᏀᏳᏻꓖꮐＧ𝐆𝐺𝑮𝒢𝓖𝔊𝔾𝕲𝖦𝗚𝘎𝙂𝙶", "H": "ʜΗНнᎻᕼℋℌℍⲎꓧꮋＨ𐋏𝐇𝐻𝑯𝓗𝕳𝖧𝗛𝘏𝙃𝙷𝚮𝛨𝜢𝝜𝞖", "I": "", "J": "ͿЈᎫᒍᴊꓙꞲꭻＪ𝐉𝐽𝑱𝒥𝓙𝔍𝕁𝕵𝖩𝗝𝘑𝙅𝙹", "K": "ΚКᏦᛕKⲔꓗＫ𐔘𝐊𝐾𝑲𝒦𝓚𝔎𝕂𝕶𝖪𝗞𝘒𝙆𝙺𝚱𝛫𝜥𝝟𝞙", "L": "ʟᏞᒪℒⅬⳐⳑꓡꮮＬ𐐛𐑃𐔦𑢣𑢲𖼖𝈪𝐋𝐿𝑳𝓛𝔏𝕃𝕷𝖫𝗟𝘓𝙇𝙻", "M": "ΜϺМᎷᗰᛖℳⅯⲘꓟＭ𐊰𐌑𝐌𝑀𝑴𝓜𝔐𝕄𝕸𝖬𝗠𝘔𝙈𝙼𝚳𝛭𝜧𝝡𝞛", "N": "ɴΝℕⲚꓠＮ𐔓𝐍𝑁𝑵𝒩𝓝𝔑𝕹𝖭𝗡𝘕𝙉𝙽𝚴𝛮𝜨𝝢𝞜", "O": "0", "P": "ΡРᏢᑭᴘᴩℙⲢꓑꮲＰ𐊕𝐏𝑃𝑷𝒫𝓟𝔓𝕻𝖯𝗣𝘗𝙋𝙿𝚸𝛲𝜬𝝦𝞠", "Q": "ℚⵕＱ𝐐𝑄𝑸𝒬𝓠𝔔𝕼𝖰𝗤𝘘𝙌𝚀", "R": "ƦʀᎡᏒᖇᚱℛℜℝꓣꭱꮢＲ𐒴𖼵𝈖𝐑𝑅𝑹𝓡𝕽𝖱𝗥𝘙𝙍𝚁", "S": "$ЅՏᏕᏚꓢＳ𐊖𐐠𖼺𝐒𝑆𝑺𝒮𝓢𝔖𝕊𝕾𝖲𝗦𝘚𝙎𝚂", "T": "ŤΤτТтᎢᴛ⊤⟙ⲦꓔꭲＴ𐊗𐊱𐌕𑢼𖼊𝐓𝑇𝑻𝒯𝓣𝔗𝕋𝕿𝖳𝗧𝘛𝙏𝚃𝚻𝛕𝛵𝜏𝜯𝝉𝝩𝞃𝞣𝞽🝨", "U": "Սሀᑌ∪⋃ꓴＵ𐓎𑢸𖽂𝐔𝑈𝑼𝒰𝓤𝔘𝕌𝖀𝖴𝗨𝘜𝙐𝚄", "V": "Ѵ٧۷ᏙᐯⅤⴸꓦꛟＶ𐔝𑢠𖼈𝈍𝐕𝑉𝑽𝒱𝓥𝔙𝕍𝖁𝖵𝗩𝘝𝙑𝚅", "W": "ԜᎳᏔꓪＷ𑣦𑣯𝐖𝑊𝑾𝒲𝓦𝔚𝕎𝖂𝖶𝗪𝘞𝙒𝚆", "X": "ΧХ᙭ᚷⅩ╳ⲬⵝꓫꞳＸ𐊐𐊴𐌗𐌢𐔧𑣬𝐗𝑋𝑿𝒳𝓧𝔛𝕏𝖃𝖷𝗫𝘟𝙓𝚇𝚾𝛸𝜲𝝬𝞦", "Y": "ΥϒУҮᎩᎽⲨꓬＹ𐊲𑢤𖽃𝐘𝑌𝒀𝒴𝓨𝔜𝕐𝖄𝖸𝗬𝘠𝙔𝚈𝚼𝛶𝜰𝝪𝞤", "Z": "ΖᏃℤℨꓜＺ𐋵𑢩𑣥𝐙𝑍𝒁𝒵𝓩𝖅𝖹𝗭𝘡𝙕𝚉𝚭𝛧𝜡𝝛𝞕", "a": "@ɑαа⍺ａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝖺𝗮𝘢𝙖𝚊𝛂𝛼𝜶𝝰𝞪", "b": "ƄЬᏏᖯｂ𝐛𝑏𝒃𝒷𝓫𝔟𝕓𝖇𝖻𝗯𝘣𝙗𝚋", "c": "ϲсᴄⅽⲥꮯｃ𐐽𝐜𝑐𝒄𝒸𝓬𝔠𝕔𝖈𝖼𝗰𝘤𝙘𝚌", "d": "ԁᏧᑯⅆⅾꓒｄ𝐝𝑑𝒅𝒹𝓭𝔡𝕕𝖉𝖽𝗱𝘥𝙙𝚍", "e": "еҽ℮ℯⅇꬲｅ𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝖾𝗲𝘦𝙚𝚎", "f": "ſϝքẝꞙꬵｆ𝐟𝑓𝒇𝒻𝓯𝔣𝕗𝖋𝖿𝗳𝘧𝙛𝚏𝟋", "g": "ƍɡցᶃℊｇ𝐠𝑔𝒈𝓰𝔤𝕘𝖌𝗀𝗴𝘨𝙜𝚐", "h": "һհᏂℎｈ𝐡𝒉𝒽𝓱𝔥𝕙𝖍𝗁𝗵𝘩𝙝𝚑", "i": "ıɩɪ˛ͺιіӏᎥιℹⅈⅰ⍳ꙇꭵｉ𑣃𝐢𝑖𝒊𝒾𝓲𝔦𝕚𝖎𝗂𝗶𝘪𝙞𝚒𝚤𝛊𝜄𝜾𝝸𝞲", "j": "ϳјⅉｊ𝐣𝑗𝒋𝒿𝓳𝔧𝕛𝖏𝗃𝗷𝘫𝙟𝚓", "k": "ｋ𝐤𝑘𝒌𝓀𝓴𝔨𝕜𝖐𝗄𝗸𝘬𝙠𝚔", "l": "1", "m": "ｍ", "n": "ոռｎ𝐧𝑛𝒏𝓃𝓷𝔫𝕟𝖓𝗇𝗻𝘯𝙣𝚗", "o": "", "p": "ρϱр⍴ⲣｐ𝐩𝑝𝒑𝓅𝓹𝔭𝕡𝖕𝗉𝗽𝘱𝙥𝚙𝛒𝛠𝜌𝜚𝝆𝝔𝞀𝞎𝞺𝟈", "q": "ԛգզｑ𝐪𝑞𝒒𝓆𝓺𝔮𝕢𝖖𝗊𝗾𝘲𝙦𝚚", "r": "гᴦⲅꭇꭈꮁｒ𝐫𝑟𝒓𝓇𝓻𝔯𝕣𝖗𝗋𝗿𝘳𝙧𝚛", "s": "$ƽѕꜱꮪｓ𐑈𑣁𝐬𝑠𝒔𝓈𝓼𝔰𝕤𝖘𝗌𝘀𝘴𝙨𝚜", "t": "ｔ𝐭𝑡𝒕𝓉𝓽𝔱𝕥𝖙𝗍𝘁𝘵𝙩𝚝", "u": "ʋυսᴜꞟꭎꭒｕ𐓶𑣘𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞𝛖𝜐𝝊𝞄𝞾", "v": "νѵטᴠⅴ∨⋁ꮩｖ𑜆𑣀𝐯𝑣𝒗𝓋𝓿𝔳𝕧𝖛𝗏𝘃𝘷𝙫𝚟𝛎𝜈𝝂𝝼𝞶", "w": "ɯѡԝաᴡꮃｗ𑜊𑜎𑜏𝐰𝑤𝒘𝓌𝔀𝔴𝕨𝖜𝗐𝘄𝘸𝙬𝚠", "x": "×хᕁᕽ᙮ⅹ⤫⤬⨯ｘ𝐱𝑥𝒙𝓍𝔁𝔵𝕩𝖝𝗑𝘅𝘹𝙭𝚡", "y": "ɣʏγуүყᶌỿℽꭚｙ𑣜𝐲𝑦𝒚𝓎𝔂𝔶𝕪𝖞𝗒𝘆𝘺𝙮𝚢𝛄𝛾𝜸𝝲𝞬", "z": "ᴢꮓｚ𑣄𝐳𝑧𝒛𝓏𝔃𝔷𝕫𝖟𝗓𝘇𝘻𝙯𝚣", "£": "₤", "©": "Ⓒ", "®": "Ⓡ" } ```

## Examples - [WordSearch](#wordsearch) - [Redact](#redact) - [Redact from file path to file path](#redact-from-file-path-to-file-path) - [Redact from file path to memory](#redact-from-file-path-to-memory) - [Redact from memory](#redact-from-memory) - [Redact files in a directory](#redact-files-in-a-directory) - [Redact files in a directory that may contain unsupported file types](#redact-files-in-a-directory-that-may-contain-unsupported-file-types) - [Redact files in a directory conditionally based on file format](#redact-files-in-a-directory-conditionally-based-on-file-format) ## WordSearch See [Loading a Glasswall Library](/embedded-engine/embedded-engine-python-loading-a-glasswall-library#wordsearch) for details on how to load the WordSearch library. ### Redact Files can be redacted individually from a file path or in memory using the [redact_file](./8-Autogenerated%20Docs/libraries/word_search/word_search/word_search.md#redact_file) method, or all files in a directory can be redacted using the [redact_directory](./8-Autogenerated%20Docs/libraries/word_search/word_search/word_search.md#redact_directory) method. #### Redact from file path to file path ```py import glasswall # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") # Redact occurrences of the text "lorem" and "ipsum" within the input file, writing the redacted file to a new path word_search.redact_file( input_file=r"C:\gwpw\input_redact\lorem_ipsum.docx", output_file=r"C:\gwpw\output\word_search\redact_f2f\lorem_ipsum.docx", content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ) ) ``` #### Redact from file path to memory `redact_file` returns an object with the attributes: "status" (int), "output_file" (bytes), "output_report" (bytes). The below example demonstrates assigning the variable `result` and checking the contents of the beginning of the redacted output_file and the output_report. ```py import glasswall # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") # Redact occurrences of the text "lorem" and "ipsum" within the input file, writing the redacted file to a new path result = word_search.redact_file( input_file=r"C:\gwpw\input_redact\lorem_ipsum.docx", output_file=None, content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ) ) assert result.output_file[:6] == b'PK\x03\x04\x14\x00' assert result.output_report[:500] == b'\n\t\n\t\t14292\n\t\tdocx\n\t\t14\n\t\n\t\n\t\tipsum\n\t\t8\n\t\t\n\t\t\t\n\t\t\t\t120\n\t\t\t\t0\n\t\t\t\t0\n\t\t\t\n\t\t\t\n\t\t\t' ``` #### Redact from memory ```py import glasswall # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") # Read file from disk to memory with open(r"C:\gwpw\input_redact\lorem_ipsum.docx", "rb") as f: input_bytes = f.read() # Redact occurrences of the text "lorem" and "ipsum" within the input file, writing the redacted file to a new path result = word_search.redact_file( input_file=input_bytes, output_file=r"C:\gwpw\output\word_search\redact_m2f\lorem_ipsum.docx", content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ) ) assert result.output_file[:6] == b'PK\x03\x04\x14\x00' assert result.output_report[:500] == b'\n\t\n\t\t14292\n\t\tdocx\n\t\t14\n\t\n\t\n\t\tipsum\n\t\t8\n\t\t\n\t\t\t\n\t\t\t\t120\n\t\t\t\t0\n\t\t\t\t0\n\t\t\t\n\t\t\t\n\t\t\t' ``` #### Redact files in a directory `redact_directory` returns a dictionary of file paths relative to the input_directory, and an object with the attributes: "status" (int), "output_file" (bytes), "output_report" (bytes). The below example demonstrates assigning the variable `results` and checking the keys and values of the `results` dictionary. ```py import glasswall # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") # Redact occurrences of the text "lorem" and "ipsum" within each file in the input_directory, writing the redacted file # to a new path in the output_directory results = word_search.redact_directory( input_directory=r"C:\gwpw\input_redact", output_directory=r"C:\gwpw\output\word_search\redact_directory", content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ) ) assert list(results.keys()) == ['lorem_ipsum.docx', 'lorem_ipsum.pptx'] assert all(result.status == 1 for result in results.values()) ``` #### Redact files in a directory that may contain unsupported file types The default behaviour of the Glasswall Python wrapper is to raise the relevant exception (see: [glasswall.libraries.word_search.errors](./8-Autogenerated%20Docs/libraries/word_search/errors/errors.md) if processing fails. Passing `raise_unsupported=False` will prevent an exception being raised and can be useful when working with a directory containing a mixture of both supported and unsupported file types when it is desirable to process as many of the files as possible instead of terminating on the first failure. The below example input directory contains the same two files in the above example as well as a file with an unsupported file format: `python-package.yml`. We can inspect the key value pairs in the `results` dictionary and see that the object returned for the `python-package.yml` file returned a `status: 0`, a failure. The `output_file` attribute is empty bytes, and the `output_report` bytes is populated with a report that includes an `IssueItem` describing the problems encountered while attempting to redact the file: `File contents could not be accessed`. ```py import glasswall # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") # Redact occurrences of the text "lorem" and "ipsum" within each file in the input_directory, writing the redacted file # to a new path in the output_directory results = word_search.redact_directory( input_directory=r"C:\gwpw\input_redact_with_unsupported_file_types", output_directory=r"C:\gwpw\output\word_search\redact_directory_unsupported", content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ), raise_unsupported=False ) assert list(results.keys()) == ["lorem_ipsum.docx", "lorem_ipsum.pptx", "python-package.yml"] assert [result.status for result in results.values()] == [1, 1, 0] print(results["python-package.yml"].__dict__) # {'status': 0, # 'output_file': b'', # 'output_report': b'\n\t\n\t\tFile contents could not be accessed\n\t\n\t\n\t\t1460\n\t\tUnknown\n\t\t0\n\t\n\t\n\t\tipsum\n\t\t0\n\t\t\n\t\n\t\n\t\tlorem\n\t\t0\n\t\t\n\t\n\n\n'} ``` #### Redact files in a directory conditionally based on file format The example below demonstrates redacting of only docx and pptx files from a directory that also contains other unsupported file types. ```py import os import glasswall # Load the Glasswall Editor library editor = glasswall.Editor(r"C:\gwpw\libraries\10.0") # Load the Glasswall WordSearch library word_search = glasswall.WordSearch(r"C:\gwpw\libraries\10.0") input_directory = r"C:\gwpw\input_redact_with_unsupported_file_types" output_directory = r"C:\gwpw\output\word_search\redact_directory_file_format" # Iterate relative file paths from input_directory for relative_file in glasswall.utils.list_file_paths(input_directory, absolute=False): # Construct absolute paths input_file = os.path.join(input_directory, relative_file) output_file = os.path.join(output_directory, relative_file) # Get the file type of the file file_type = editor.determine_file_type( input_file=input_file, as_string=True, raise_unsupported=False ) # Protect only doc and docx files if file_type in ["docx", "pptx"]: # Redact occurrences of the text "lorem" and "ipsum" within the input file, writing the redacted file to a new path word_search.redact_file( input_file=input_file, output_file=output_file, content_management_policy=glasswall.content_management.policies.WordSearch( config={ "textSearchConfig": { "@libVersion": "core2", "textList": [ {"name": "textItem", "switches": [ {"name": "text", "value": "lorem"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, {"name": "textItem", "switches": [ {"name": "text", "value": "ipsum"}, {"name": "textSetting", "@replacementChar": "*", "value": "redact"}, ]}, ] } } ) ) ```