Validating Cloud Solutions for Big Data, AI/ML Implementations

Senior Life Science Consultant


With the advent of technology, many organizations are moving towards or adapting technologies like Cloud solutions. They are less expensive, easier to maintain (or no maintenance), secure, reliable, and, more importantly, scalable while implementing the BIG Data, AI/ML solutions. This document describes how KVALITO helped clients in GxP Validation of Cloud solutions used in such cases.

What is Cloud Computing?

Cloud computing is defined as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” [R1]

There are three main service models under Cloud Service offerings:

  • Software as a Service (SaaS)
  • Platform as a Service (PaaS)
  • Infrastructure as a Service (IaaS)

Software as a Service (SaaS) [R1] is defined as “The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure”, where the consumer has limited user-specific configuration settings in control.

Platform as a Service (PaaS) [R1] is defined as “The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider”, where the consumer has control over deployed applications and configuration settings.

Infrastructure as a Service (IaaS) [R1] is defined as “The capability provided to the consumer is to provision processing, physical or virtual servers, storage, networks, and other fundamental computing resources (i.e. backup, monitoring), where the consumer can deploy and run arbitrary software, which can include operating systems and applications”, where the consumer has limited control of select networking components.

There are four deployment models for Cloud technologies:

  • Private Cloud
  • Community Cloud
  • Public Cloud
  • Hybrid Cloud

The organization selects Cloud Service and Deployment models based on the Enterprise Architecture guidelines. Among the three primary XaaS services, SaaS is where the customer subscribes to use the software; they do not own the infrastructure, the servers or the software, as it is less expensive and significantly fewer efforts to manage, which increases the risk for life sciences companies as the SaaS provider can update the software which is validated. This increases the focus on the validation of the cloud systems/platforms.

What is BIG DATA?

There is so much confusion over Big Data with the words like Data lake, Data warehouse, Database and Data swamp; what are they?

Essentially, a database is an organized collection of data. Databases are classified by the way they store this data. A data warehouse collects data from various sources, whether internal or external and optimizes the data for retrieval for business purposes. The data is usually structured.

A data lake is a place to store structured and unstructured data, as well as a method for organizing large volumes of highly diverse data from diverse sources.” The data lake tends to ingest data very quickly and prepare it later; on the fly, making it the most preferred solution for organizations as business relies on multiple data sources that cost less. A Data swamp is an unmanageable Data Lake.

Big data is the collection of larger, more complex data sets coming from new data sources (or old ones also combined), arriving in larger volumes and with more velocity where traditional data processing software can’t manage. But these massive volumes of data can be used to address business problems that wouldn’t have been able to tackle before.

With the storage of data becoming less expensive [R7], availability of larger volumes of data for the organizations to study and promising solutions from Artificial Intelligence (AI) and Machine Learning (ML) are coming like ProFound AI [R8], Life sciences companies are deploying applications and medical devices using them. AI /ML solutions depend on Cloud solutions as they are scalable as per the needs.

Validation of computer systems is done under the FDA guideline regulation “General Principles of Software Validation; Final Guidance for Industry and FDA Staff”[R2] and Good Automated Manufacturing Practice (GAMP) [R10]. These guidelines are not referencing any technologies and are relevant in the era of Cloud computing also. However, as organizations interpret the guidelines in different ways, Health Authorities like FDA, EMA, PIC/S and WHO issued guidelines [R3][R4][R5][R6] for the latest technologies like AI/ML on Cloud platforms.

Figure 1: GxP Validation landscape.

Approach for Cloud Validation

  • As the Cloud vendor is managing the Hardware and Software, Master Service Agreements and Quality Agreements, to cover the scope of uptime, backup/ recovery practices, systems maintenance, Vulnerability management and security of the system
  • Infrastructure qualification is done by Vendor, so leverage the documentation produced by the vendor partner after assessing the documentation as per the quality standards required.
  • Third-Party certifications like System and Organization Controls (SOC), ISO/ISEC 27000 to be considered as valid evidence
  • Ensure that the Quality Management System or SOP covers the Cloud systems validation process
  • Vendor assessment for Cloud vendor and other vendors involved in the design, implementation and maintenance of the system
  • Draft the Validation strategy with the below details for GxP systems

Figure 2. Validation Strategy – components

Advantage KVALITO

  • KVALITO Consultants and Subject Matter Experts Pool
  • Learning Culture – Ongoing Training Programs
  • Service Delivery and Business Relationship Management
  • Business Continuity – Talent Acquisition and Management
  • Market Focus – Life Sciences and Healthcare Industry
  • We Support – Applied Technology and Innovation
  • We Understand Your Business – People, Processes, Tools, and Data
  • Audit and Assessment
  • Strategy and Advice
  • Industry Benchmarking with Peer organizations such as ISPE
  • Engagement in the special interest groups such as ISPE Predictive Risk Management Group


What KVALITO did

  • Responsible for Quality Management of the ATI (Applied Technology Innovation) Portfolio for deploying Artificial Intelligence and Machine Learning (AI/ML) solutions for the Business Services on Cloud platforms.
  • Developing the CSV strategy / CSV SOPs, leading and supporting IT project as well as operation teams with CSV compliance following Quality guidelines for AI /ML solutions
  • Interfacing between the business, IT and business QA organizations for Quality related topics (cross-functional and cross-divisional interactions)
  • Creation of Audit Readiness Package as per the CSV SOPs


Implementation of:

  • Augmented Reality (AR) & Virtual Reality (VR) Platform and the subsequent business use projects, e.g., to build corporate-wide collaboration tools
  • Semantic Search Platform, to provide corporate-wide search capabilities that allow business areas to search multiple systems together through one harmonized interface on a common platform, e.g., to answer questions from the Health Authorities, the Regulatory Affairs
  • Classification Engine for Adverse Reactions, Product Complaints and Medical Information Request
  • Automatic Data Extraction tools
  • Data Verification Processes and Tools for Machine Learning
  • Strategizing client`s vendor selection processes for SaaS and Cloud Based solutions along with creation of PoCs (Proof of Concept), and implementation of selected solutions
  • Initial GxP classification and guidance on CSV related topics and information
  • Ensure that computerized systems are fit for purpose
  • Audit Support (Internal Global Quality Audit and External Swiss Medic and FDA Audit)
  • Audit remediation for critical large programs
  • QA operational support
  • Responsible for IT Operations for Cloud solutions (SFDC, Veeva)
  • Set-up of Operations Definition and Service Description
  • Negotiate Service Contracts with Third-Party Vendors
  • Defined Transition Strategy
  • Operational Model for Cloud Solutions (SFDC, Veeva etc.)
  • Integration of tools like Jira, Altaian

People Roles:

  • Life Sciences Consultant, Program Quality Management (AI/ML solutions)
  • CSV Quality Assurance
  • Life Sciences Consultant, Process Transition and Organizational Change Manager
  • SME for GXP Determination
  • QA & eCompliance responsible
  • CSV and Data Integrity SME
  • CSV Global Project Management
  • Cloud Solutions SME
  • External Auditor for IT platforms and CSV
  • Validation Lead/SME for Audit remediation Projects (GMP, Factories)


Value Delivered

  • Guiding organization on drafting, implementation of CSV strategy and SOPs for Cloud solutions
  • Program Quality management for AI/ML solutions
  • Implementation of AR & VR platforms
  • Drafting Data verification processes for Machine Learning
  • Systems are audit-ready with availability of Audit readiness Packages
  • Creation of automated workflows as part of digital validation processes
  • Supplier / Vendor audits on cloud technologies
  • Guidance on Quality agreement with the Cloud service providers



  • Novartis
  • Johnson & Johnson
  • Advanced Accelerator Application
  • Avexis – Novartis Gene Therapies
  • Actelion
  • Galapagos


R1 – NIST SP 800-145 The NIST Definition of Cloud Computing

R2– FDA General Principles of Software Validation

R3– FDA / MHRA /Health Canada -Good Machine Learning Practice for Medical Device Development


R5– IMDRF -Machine Learning Enabled Medical Device

R6– WHO guidance on Ethics and Governance of Artificial Intelligence for Health

R7– ISPE Getting ready for Pharma 4.0


You May Also Like…

Megan Hoo Internship Report

Megan Hoo Internship Report

Three years ago, I made a deliberate choice to pursue science, with a future I’d envisioned myself entrenched in...