Constellations FAQs
    • PDF

    Constellations FAQs

    • PDF

    Article Summary

    What is the maximum file size that you support?

    1 GB - Constellations relies on Glasswall Halo and shares the same maximum file size.

    Do you only support V3 of Helm?

    Yes

    What information is sent back to Glasswall from my deployment?

    Constellations is designed to be run in a secure environment, therefore none of your data is sent to Glasswall. Glasswall reserves the right to request summary log information to verify license conformance but data does not leave the environment without intervention by the system owner.

    Why do some of your services use Alpine as the base OS?

    In the future, most of the services will use a hardened Alpine base image. Where possible we implement CIS (Centre for Internet Security) guidelines for hardening. Alpine provides a small Linux distribution with a minimised attack surface which is attractive from both a performance and security perspective.


    What is the recommended node footprint?

    Please see Performance and Scaling.

    What storage types can Constellations connect to?

    Azure Blob Storage.

    How are logs managed?

    Logs for all of the Constellations services are written to `stdout` (Standard Output) for their respective docker containers.

    Why are there two Node Pools?

    Constellations deployments require two node pools in order to separate Glasswall Halo resources, from the scan management nodes. This separation allows for customisation of Node size/configuration.

    For example, Glasswall Halo may require larger nodes with higher memory and cpu limits in order for the engine service to process complex files.

    Where in the flow of data is traffic not encrypted during transit, including interservice communication?

    Constellations services communicate via RabbitMQ messages using the AMQP protocol. This data is not encrypted during transit. Messages are only ever sent inside the Kubernetes cluster.

    Requests to the Scan Controller API and Event Projection API are received via TLS.

    Requests from Constellations to Glasswall Halo are sent via TLS.

     What level of security hardening has taken place?

    The hardening strategy for Constellations encompasses two aspects: 

    • Hardening of the development process, which focuses on strengthening security measures during code creation, storage, and access.
    • Hardening of the consumable components, used within the product.

    The hardening process for the development lifecycle comprises multiple stages. We incorporate security gates at each stage of the development lifecycle, ensuring that only secure, vulnerability-free code is integrated into our code base. This process involves implementing Static Application Security Testing (SAST), Software Composition Analysis (SCA), Infrastructure as Code scanning (IaC), and Dynamic Application Security Testing (DAST) tools into our Continuous Integration/Continuous Deployment (CI/CD) pipeline.

    Similarly, the hardening process for the consumable components is a crucial part of our overall approach. We diligently maintain hardened Docker images in-house, subjecting them to thorough scrutiny and ongoing monitoring. Consequently, we confidently state that the images used in Constellations undergo a hardening process that aligns with, at the very least, the minimum security standards set by the National Institute of Standards and Technology (NIST). 

     What version of Kubernetes does Constellations support?

    Kubernetes Version: v1.25.6 or above.

     I‘ve noticed that when I cancel a job, this does not seem to stop processing of files immediately, why is that?

    The action of cancellation stops the cataloguing action of the scan from finding files in the specified container. Any files that have already been found and submitted for processing will continue to be processed.

     Why don’t you place the quarantined file into the quarantine folder?

    Constellations uses the quarantine folder to store a JSON quarantine report for each file. This includes some of the original file metadata including: file size, file type, input and output hashes, and location in the source container.

    A quarantined file will never be copied to an output container, this is to prevent duplication of potentially malicious files. The quarantine report allows a safe space to understand more about the file, allowing users to take further action.

     Why does error metadata for ‘supported’ file types appear in the quarantine folder?

    Sometimes, Glasswall Halo fails to rebuild or analyse a file, resulting in an error status sent back to Constellations. If this happens, the quarantine report is populated with descriptions and error codes in the `errors` section. More information on these errors can be found in the Glasswall Halo API Documentation.

    Non-successful status codes can occur for a number of reasons:

    • The file exceeded the maximum file size limit.
    • Processing took too long and the configurable timeout was reached.
    • The file is masquerading as a supported file type but is actually a different file type.

    Example: 

    json 
    "Errors": [
      {
        "ErrorCode": 5003,
        "ErrorDescription": "Failed to detect file type"
      }
    ]
     What are the advantages and disadvantages of choosing Express mode for file processing vs Standard mode?

    When express mode is enabled, only supported file types will be sent to Glasswall Halo for processing.

    Advantages

    Unsupported files are not processed, saving the CPU and Memory required by the Glasswall engine to identify the file.

    Disadvantages

    Relies on the file extension of the file, potentially supported files could either be masquerading as other filetypes, or missing a file extension. This could cause supported files to be quarantined.

     How can I make Constellations process files more quickly at time 0 seconds? Throughput seems to dramatically increase as time goes on.

    Constellations services use Kubernetes Event Drive Auto-Scaling (KEDA) to scale from a minimum number of replicas, to a maximum. Constellations expects a high volume of throughput and thus will require node pools scaling to meet the demands of the various services. 

    A 'cold-start' has been observed with node scaling in AKS. This is the reason for a percieved ramp-up of throughput over time. 

    In order to mitigate this cold-start, nodes can be pre-warmed:

    • Set the CPU and Memory requests/limits to the same value for the `cdrplatform/engine` and `cdrplatform/api` services. Guaranteeing each pod has the same resources.
    • Manually scale Glasswall Halo and Constellations services to the desired replica count, which will trigger node scaling to meet the CPU and Memory demands.
     Does the grade of storage from the cloud provider impact the speed of processing?

    For storage accounts in Azure we recommend a minimum grade of `StorageV2 (general purpose v2)`. Increasing the grade shouldn't affect throughput.

    For Cosmos instances - in order to reach a processing time of 500GB in 30 minutes, the total throughput limit should be set to 3000 RU/s. This is to prevent rate limiting and increasing this limit further has no effect on processing time.

     What security recommendations do you have for the deployment and operation of the solution?

    Data Security 

    • Connection strings for both source and destination storage accounts are provided by users on a per-scan basis. This security best practice is for the users to share connection strings that expire within a limited time period.
    • No customer files are held within the product once the CDR processing is completed.

    Network Security

    • A Private AKS cluster is recommended with private endpoints on both Cosmos and Blob Storage.
    • TLS certificates should be configured on the Ingress into the Scan Controller and Event Projection APIs.

    Authentication and Authorisation

    • The Scan Controller and Event Projection APIs support integration with Azure Active Directory (AAD). We recommend authentication is Enabled in both of these services.
    • Kubernetes Secrets, along with Azure Key Vault can be used to store the credentials for Cosmos, RabbitMQ and AAD.
     Is there a possibility that the solution may accidentally execute malware during processing?

    Every precaution is taken to ensure Constellations never executes malware. In the event that a malicious file brings down a pod, the damage is contained as docker containers are given restricted access to specific folders in specified areas of the filesystem and Constellations services are run in non-root user groups.

     What options do I have for using a private container registry, rather than Glasswall’s?

    An alternative to using the Glasswall ACR in your production environment is to push the Constellations Helm charts and Docker images to a private ACR.

    This can be achieved by pulling the desired images and charts to an intermediary location (for example: a local machine or a bastion server) and then pushing them into your private repo. Then, while installing the Helm charts, update the 'image.repository' tags to point to the correct repo.

    Do you support S3 buckets as well as Blob Storage? 

    Currently, only Azure Blob Storage is supported.

     What will happen if the source or destination data containers are deleted following completion of a job?

    Provided the scan/job has completed, it is safe to delete the source and destination containers. The results of the scan will persist in the Cosmos Database.

    If the source container is deleted, it will no longer be possible to perform a rescan.

    The CDR Enabler service will re-create the quarantine/target containers as long as the shared access signature is still valid.

     What will happen if an archive file with say 10 levels of nesting is provided to the system?

    A maximum of 5 levels of nesting in archives is supported by Glasswall Halo and Constellations. Once Glasswall Halo reaches this nesting limit, no further processing takes place.

    While 'basic' archive processing mode is enabled the entire archive will be marked as `Errored` and a quarantine report will be stored in the quarantine container.

    With 'detailed' archive processing mode enabled the overall file will be marked as `Failed`. A clean file with child archives (up to the limit of 5) will be stored in the target folder. The first child archive past the nesting limit is replaced with a `.txt` placeholder and marked as `Errored` with a quarantine report stored in the quarantine container.

     Which cloud environments do you support?

    Azure - AKS

     Where files fail to be processed, and are referenced in the quarantine folder, should I expect to see them in the destination folder?

    Single files (and archives while `basic` archive processing mode is enabled) that fail to be processed will not be stored in the target folder, instead a quarantine report is generated and stored in the specified quarantine folder.

    In 'detailed' archive processing mode, partially rebuilt archives are stored in the target folder, but the quarantined files are replaced with `.txt` placeholders.

     If a file has been quarantined, what is the next step in removing a threat from a file? What does Glasswall recommend? 

    There is no silver bullet in security. Glasswall’s CDR technology doesn’t attempt to detect malware and therefore, unlike many other security tools we don’t have a false positive and false negative rate. Glasswall provides an important level of certainty to the system owner. If the file has been protected by our comprehensive level of CDR, malware does not have a hiding place within the file.  

    Sometimes a file can deviate markedly from the safe specification for that file type, making the CDR process impossible. Whilst a parser might still be able to render the broken file, there is a significantly higher likelihood that a file which cannot undergo CDR may have been tampered with, to deliberately make it unsafe. Another explanation is that the authoring software is not respecting the industry specification for that file, and so may be saving in such a way to make other parsers unstable.  

    Files which cannot be protected by other CDR, should be presented to other security filters such as next generation anti-virus, deep learning analyzers or sandboxes. None of these solutions can eradicate the risk completely but may be able to detect malicious signal in the file. These alternative solutions can be effective, but always have a false negative and false positive error rate. 

    Having Glasswall CDR technology as the first in line of protection will ensure that less triage is necessary in relation to detection error rates that are associated with the other security filters.  

      Why do some larger files seem to get processed faster compared to some smaller files? Why do two files of approximately the same size take different periods of time to be processed?

    The size of the source file is certainly correlated with the processing time of a file. However, the nature of the file format and the data complexity of a specific file are more strongly correlated with the time it might take to analyze a file.

    The CDR process builds a document object model of the underlying file to assist with the analysis. If this graph of that file has many thousands of nodes to inspect as the CDR Engine travels along that data representation, the computational effort is higher compared to a larger file which has a simpler structure. 

    This is analogous to how compression rate varies between files, even though they are similar sizes before the compression step. Redundancy, encoding, and entropy are common factors which influence the speed and efficiency of the process. 

     What is the recommended approach for upgrading the Kubernetes version in the managed Kubernetes solution?

    For AKS, the recommended approach is outlined in the Microsoft Documentation.

     What error messages should we actively monitor?

    RabbitMQ Errors

    When an error occurs in one of the Constellations services, the message is retried. Once the message hits a configured retry limit, it is sent to the dead-letter queue.

    If a message has been sent to the dead-letter queue, the Scan Controller service will process it. For visibility the service logs a 'Critical' error.

    Example:

    json
    {
       "Category":"Constellations.Scan.Controller.Business.Commands.FailScan",
       "EventId":"19",
       "LogLevel":"Critical",
       "Message":"A dead letter has been received when processing a message of type 'com.glasswall.cdr-enabler.process-file-item.v1' triggered at '05/24/2023 15:52:43 +00:00'. The Scan will now be failed"
    }
     Do you support ARM 64-based machines? 

    Constellations uses the Glasswall Embedded Engine which does not currently support ARM based machines.

     How do I offload/aggregate logs to my preferred network location? 

    AKS

    In AKS (Azure Kubernetes Service) Container Insights can be configured, which will include Log Analytics.

    After you enable monitoring from Kubernetes clusters, metrics and Container logs are automatically collected for you through a containerized version of the Log Analytics agent for Linux.

    Log Aggregation

    A log aggregation tool such as DataDog can be deployed into the cluster.

    We recommend installing the DataDog agent via Helm chart alongside the Constellations deployments.

     How can I trace all logs for a scan?

    When logs are generated by the system, they have associated data attached to them. This is achieved by using dotnets logging scopes.

    Generally scopes are created when beginning a logical operation within the system, and end with the operation. Constellations will define data as key value pairs.

    Messages sent from service to service include currently in-scope data within the RabbitMQ header 'trace-data'. 

    Every scan constellations carries out is identified using a Globally Unique Identifier (GUID). This ID is defined within scopes to ensure that all logs will contain a reference to it.

    When crossing domain boundaries from constellations to Glasswall Halo, the Scan ID needs to still be logged. The Synchronous API in Glasswall Halo provides a mechanism to do this.

    Requests to the API include a header 'X-Trace-Data' which includes the scope data currently defined. At this point the Scan ID will be defined as scope data.

    In order to trace a scan through the system, logs must be filtered by this scope value. Log statements are JSON formatted and contain a Scopes object which should be inspected.

    Using Datadog as an example, this is achieved by using the following search:

    @Scopes.scanId:374786e0-d740-4bc9-b13c-d47bb50ec72b
     How can I trace a single file through the logs?

    In the same way tracing a scan through the system, a file can be traced. A GUID is generated per file, and passed through the system as scope data. As with ScanId, FileId is also passed to Glasswall Halo. However Glasswall Halo also tracks internally a GUID for files it processes. For this reason SessionId is used instead to store Constellations idea of FileID.

    An example using Datadog:

    @Scopes.sessionId:374786e0-d740-4bc9-b13c-d47bb50ec72b
    @Scopes.sessionId:374786e0-d740-4bc9-b13c-d47bb50ec72b
     How do I update the services from a security perspective? 

    See the Upgrading Constellations.

     What are the main open source components that are comprised within the overall solution? How are the open source components licensed? 

    Please see Third Party Libraries.


    Was this article helpful?