From Informatica PowerCenter to Google BigQuery: How a Top-10 Global E-commerce Platform Modernized 2,400 Pipelines at Scale

MigryX Case Study • April 2026 • E-commerce & Marketplace Platforms

Executive Summary

One of the world's ten largest e-commerce and marketplace platforms chose MigryX to execute a wholesale replacement of its Informatica PowerCenter 10.2 data integration estate. The platform's data engineering function had operated Informatica as its central ETL backbone for over 15 years, accumulating 2,400 PowerCenter mappings and workflows that powered everything from real-time order processing and fraud detection to seller performance analytics and advertising attribution. Housed in an on-premises data center with bespoke server configurations, the estate represented a significant operational liability as the business scaled to process over 12 billion events per day during peak commerce seasons.

Over a focused ten-month engagement, MigryX parsed, analyzed, and converted every PowerCenter mapping XML export into a combination of Dataform SQL models, Cloud Composer DAGs, and Pub/Sub-to-BigQuery streaming pipelines. The result was an 8X improvement in end-to-end pipeline throughput, $3.2 million in projected two-year savings, and a data platform capable of supporting the platform's next generation of real-time personalization, dynamic pricing, and supply chain intelligence workloads.

Client Overview

The client operates a two-sided marketplace connecting millions of buyers with millions of sellers across dozens of countries. Its data platform is the operational nerve center of the business, supporting use cases that span real-time fraud scoring, catalog relevance ranking, seller compliance monitoring, advertising measurement, financial reconciliation, and executive reporting. The data engineering team is a large, globally distributed organization spanning multiple time zones.

PowerCenter had been introduced in the early 2000s as the platform's primary data movement tool, initially handling nightly batch loads from transactional databases into a centralized Oracle data warehouse. Over time, the estate grew to encompass near-real-time CDC feeds, dozens of third-party data source integrations, and complex multi-hop transformation workflows that moved data between Oracle, Netezza, and eventually Snowflake as the company added cloud capabilities. By the time MigryX was engaged, PowerCenter was simultaneously the most critical and most difficult-to-maintain component of the data stack, with a bus factor of fewer than 10 engineers who understood its deepest configuration layers.

Business Challenge

The decision to migrate off PowerCenter was accelerated by a combination of strategic, financial, and operational pressures that had been building for several years:

The MigryX Approach

The engagement began with MigryX ingesting the client's PowerCenter repository exports: 2,400 mapping XML files, 890 workflow XML files, and the full parameter file library. The MigryX XML parser reconstructed the complete logical structure of each mapping, identifying every source definition, target definition, transformation object, and port-level connection. This structural representation was then analyzed by the complexity classifier, which categorized mappings into three tiers: direct SQL translation, augmented SQL with Dataform macros, and hybrid SQL plus Python Cloud Function for transformations involving proprietary custom logic.

XML-Driven Mapping Conversion to Dataform

For the 1,640 mappings classified as direct or augmented SQL translations, MigryX generated Dataform SQLX models that preserved the exact transformation semantics of the source mapping. Joiner transformations were rendered as BigQuery JOIN clauses with matching join conditions and join types. Router transformations became SQL CASE expressions or multi-table UNION patterns depending on the routing topology. Aggregator transformations mapped to BigQuery GROUP BY aggregations with equivalent window function expressions where incremental aggregation semantics were required.

PowerCenter's Expression transformation language, a significant source of migration risk for any non-parser-based approach, was handled by MigryX's expression translation module, which mapped over 340 proprietary PowerCenter functions to their BigQuery SQL equivalents. Where a one-to-one function mapping did not exist, MigryX generated equivalent BigQuery UDFs and included them in the target Dataform project, maintaining full behavioral equivalence while producing auditable, testable SQL rather than opaque wrapper functions.

Workflow Conversion to Cloud Composer DAGs

PowerCenter workflow XML files define session scheduling, task dependencies, failure handling, and email notification behavior. MigryX parsed each workflow's task dependency graph and emitted a corresponding Cloud Composer DAG, with each PowerCenter session mapped to a Dataform compilation and execution operator or a Dataproc job submission operator depending on the underlying mapping type. Workflow-level pre- and post-session commands were converted to Airflow PythonOperators, preserving custom shell script logic that had been embedded in session properties.

Real-Time CDC to Pub/Sub and BigQuery Streaming

The 247 mappings that powered real-time CDC feeds were redesigned as event-driven architectures rather than ported as near-real-time polling jobs. MigryX worked with the client's platform engineering team to instrument source database CDC streams into Google Pub/Sub topics using Datastream for continuous replication. MigryX-generated Cloud Functions consumed Pub/Sub messages and performed the lightweight transformation logic previously handled by PowerCenter's CDC sessions, writing directly to BigQuery via the Storage Write API for sub-second end-to-end latency. This architectural shift reduced fraud detection pipeline latency from an average of 4 minutes to under 12 seconds.

Migration Architecture

Component Legacy (Before) Modern (After)
ETL platform Informatica PowerCenter 10.2 on-premises grid Dataform SQLX + Google Cloud Dataproc
Data warehouse Netezza + Oracle (on-premises) Google BigQuery (multi-region US + EU)
Workflow orchestration PowerCenter Workflow Manager + pmcmd Cloud Composer 2 (Apache Airflow 2.x)
Real-time CDC ingestion PowerCenter CDC connectors + custom Java transformations Google Datastream → Pub/Sub → BigQuery Storage Write API
Custom business logic PowerCenter Java Transformation + Expression language BigQuery UDFs + Cloud Functions (Python 3.12)
Data quality Informatica Data Quality (IDQ) rules Dataform assertions + BigQuery Data Quality rules
Lineage & metadata Informatica Metadata Manager Google Dataplex + OpenLineage-compatible DAG metadata
Monitoring & alerting PowerCenter Monitor + custom SMTP alerts Cloud Monitoring dashboards + PagerDuty integration via Airflow

Key Migration Highlights

MigryX Migration Highlights — Informatica PowerCenter to BigQuery

Security & Compliance

Operating at the scale of a top-10 global marketplace introduces security and compliance obligations that span PCI DSS (for payment card data), GDPR and similar data privacy regulations across 38 operating countries, and internal data governance standards enforced by a dedicated data stewardship function. The BigQuery target architecture was designed to address each of these dimensions.

Payment-related data flows were isolated within a dedicated Google Cloud project governed by VPC Service Controls, with BigQuery authorized views providing access to downstream analytical consumers without exposing raw payment records. PCI DSS-scoped data was tokenized at ingestion using Cloud DLP before landing in BigQuery, with the token-to-PAN mapping stored separately in a Cloud HSM-backed system, satisfying the client's external PCI QSA requirements.

GDPR data subject rights (right to erasure, right to access) were operationalized through BigQuery's support for table-level and column-level access policies and a purpose-built Cloud Run service that executed row-level deletion jobs against partitioned BigQuery tables on receipt of verified erasure requests. This replaced a manual, error-prone PowerCenter-based erasure workflow that had been flagged by the client's DPO as a compliance risk in the prior annual privacy audit.

Seller PII and buyer personal data were classified using Dataplex's automated data discovery and BigQuery's sensitive data protection integration, providing the data governance team with a continuously updated inventory of where personal data resided and which pipelines accessed it, replacing a static spreadsheet-based data map that had been the client's sole privacy inventory mechanism.

Results & Business Impact

The migration delivered measurable improvements across platform performance, operational resilience, and total cost of ownership, validated through six months of production operation following the final cutover.

2,400
PowerCenter mappings & workflows migrated to BigQuery
1.7M
Lines of target code generated by MigryX across Dataform, DAGs & UDFs
8X
End-to-end pipeline throughput improvement vs. PowerCenter grid
$3.2M
Projected savings over 2 years from license and infrastructure elimination
12 sec.
Fraud detection pipeline latency (down from 4 minutes)
10 mo.
End-to-end migration duration from discovery to production cutover

The architectural shift to streaming CDC via Pub/Sub and BigQuery's Storage Write API had an immediate impact on the platform's personalization infrastructure. Feature freshness for the real-time recommendation engine improved from a 4-to-6-hour lag to under 15 minutes, enabling the product team to deploy a new class of session-context-aware recommendation models that had been technically blocked for over two years by the PowerCenter latency ceiling. The client's product team reported that improved feature freshness contributed to measurable improvements in recommendation quality during the first 60 days of production operation.

Peak season resilience improved dramatically. The first Q4 commerce season following migration processed 40% higher event volumes than the prior year with zero pipeline SLA breaches, compared to two major degradation events in the prior Q4 under PowerCenter. The operations team estimated that the avoided incidents alone represented $2.8 million in protected revenue and avoided incident response costs.

"PowerCenter was the engine of our data platform for 15 years, and it had become the ceiling on everything we wanted to build. We'd tried twice before to migrate off it and both attempts died on the vine once the team realized what was actually inside those mapping XMLs. MigryX was the first solution that treated the XML as structured code rather than configuration, and that made all the difference. We went from a brittle on-prem grid to a fully serverless BigQuery stack in 10 months, our real-time pipelines are faster than we ever thought possible, and our data engineering team can now focus on building new capabilities rather than maintaining legacy infrastructure."

— Chief Data Officer, Top-10 Global E-commerce & Marketplace Platform (anonymized)

Ready to Modernize Your Informatica PowerCenter Estate?

See how MigryX can accelerate your migration to BigQuery — from XML export to production Dataform pipelines.

Explore BigQuery Migration →