Public sector organisation

Land the platform in AWS without sacrificing resilience

How the public-sector case study moved onto AWS using Terraform, ECS Fargate, PostgreSQL, Redis-backed sessions, and GitHub Actions while the team validated the new runtime internally.

March 2026 AWSECS FargateTerraformPostgreSQL

By this point the application was ready to leave the purely legacy hosting model behind. The next step was to build the runtime and operational model we actually wanted.

For Atlas Orders that meant:

provisioning an ECS Fargate cluster
defining infrastructure with Terraform
replacing Oracle with PostgreSQL
introducing GitHub Actions for build and deployment
making session handling resilient across multiple running tasks

Simplified AWS target platform diagram with AWS at the centre, Fargate, S3, RDS, ElastiCache, and Route 53 around the edge, and Spring Boot, Terraform, and GitHub feeding into the platform. — The target architecture was deliberately simplified: AWS became the centre of the runtime, managed services sat around it, and the application, infrastructure, and delivery inputs fed the platform from outside.

Provisioning the new platform

The infrastructure work was not treated as an afterthought once the application was “done”. It was part of the migration.

We used Terraform to provision the ECS Fargate cluster and the surrounding AWS estate. That gave the team a repeatable, reviewable definition of the target platform instead of a set of manually built environments that would inevitably drift.

The new runtime model immediately improved a few things:

services could be scaled horizontally without managing more application server VMs
container images became the deployable unit rather than hand-built server state
infrastructure changes became visible in pull requests
the path to safer repeatable environments became much clearer

At this stage the architecture was recognisably different from the original estate:

GitHub and GitHub Actions replaced the opaque release path
container images became the deployable contract
ECS Fargate replaced long-lived WebLogic VMs
Route53 and load balancing gave us more controlled traffic management
Redis removed the need for fragile sticky-session assumptions
Terraform gave the estate a maintainable infrastructure definition

Taking the database opportunity at the right moment

The database decision was specific to this application.

Atlas Orders was not tightly coupled to Oracle in the way some legacy estates are. That gave us a good opportunity to move to PostgreSQL as part of the platform landing. On another programme, that might have been the wrong time to make a database change. Here it was a pragmatic choice because the application’s persistence layer was not so Oracle-specific that it would dominate the migration risk.

The benefit was not just licensing or preference.

Moving to PostgreSQL let the team align the platform more closely with the target AWS operating model, reduce some of the specialist overhead of the previous estate, and simplify the long-term hosting story.

That was also a cost decision. Reducing dependence on Oracle-era infrastructure and specialist administration lowered the ongoing operational burden around the application.

Introducing GitHub Actions at the point it could add real value

Earlier in the migration we had deliberately not rushed the deployment pipeline.

Now it made sense.

Once the codebase was in GitHub, the build was under Gradle, the runtime shape was clearer, and the container target existed, GitHub Actions became the natural place to automate build and deployment.

That gave the programme:

consistent container builds
traceable deployments from source control
less dependence on locked-down jump hosts and manual release choreography
a platform for automated checks such as vulnerability alerts and dependency updates

This is where the delivery model finally started to feel modern as well as the code.

For the client, this translated into something highly visible: releases no longer depended on a tightly controlled machine and institutional memory. They became easier to understand, audit, and improve.

Handling sessions properly in a distributed runtime

When you move from a single long-lived application server pattern into multiple container tasks, session behaviour becomes a real operational concern very quickly.

We could not afford a model where every deployment or task restart risked dropping active user sessions across the customer and order flows. Sticky sessions would have been a weak workaround, not a good target architecture.

So we introduced Redis-backed session management.

That gave the platform:

continuity across multiple nodes
less reliance on routing tricks
safer rolling deployments
more freedom to scale task counts up and down in response to demand

This was one of the more important architecture gains in the migration. It was not flashy, but it translated directly into user stability.

Why the AWS move reduced cost as well as risk

The cost story was not simply “cloud is cheaper”.

The savings came from reducing the cost of maintaining the legacy operating model:

fewer bespoke VMs to patch and manage
less operational ceremony around deployments
no need to keep growing a WebLogic-centric estate just to serve the same workload
better alignment between actual demand and running capacity
a more supportable platform for current engineers

On top of that, uptime improved because the AWS model gave us better primitives:

Fargate tasks could be replaced automatically
services could run across multiple instances rather than relying on a single app server footprint
infrastructure could be expressed and reproduced instead of repaired manually

Commercial gains

Lower infrastructure overhead, fewer specialist runtime constraints, and a delivery model that no longer depended on fragile operational ceremony.

Technical gains

Infrastructure as code, containerised delivery, shared sessions, scalable runtime behaviour, and a platform that could be operated with clearer guardrails.

At this stage the application was live internally, not yet the public source of truth.

That internal live period mattered. It gave the team the space to test performance, observe behaviour under real load, and deal with the inevitable operational issues before asking end users to depend on the new platform.

The final step was the cutover.

Previous phase Next phase: cutover, recovery, and the UI refresh