Data Migration Strategies¶
Introduction¶
Data migration is often the most critical, complex, and risky aspect of cloud exit. Unlike compute resources that can be recreated or applications that can be redeployed, data carries the history, state, and value of an organization's operations. Loss, corruption, or extended unavailability of data during migration can have catastrophic business consequences.
This chapter covers comprehensive strategies for migrating data across the hybrid continuum—from Azure cloud storage to on-premises systems—with specific attention to minimizing downtime, ensuring data integrity, maintaining consistency during transition periods, and meeting compliance requirements. The goal is to move data safely, verify it completely, and maintain business continuity throughout the process.
Data Migration Risk Profile
Data migration failures represent the highest risk category in cloud exit projects. The combination of data volume, complexity, and business criticality demands extensive planning, testing, and validation. Never proceed without verified backup and rollback procedures.
Data Migration Challenges¶
Data Volume and Transfer Times¶
The Challenge:
Large datasets require substantial time to transfer over network connections. A 10 TB database over a 1 Gbps connection theoretically takes 22+ hours, but real-world factors (network overhead, encryption, packet loss, retransmissions) often double or triple this estimate.
Impact:
- Extended migration windows
- Increased risk of consistency issues during long transfers
- Potential for interruption requiring restart
- Business pressure to complete migration faster than technically feasible
Mitigation Strategies:
- Bandwidth planning: Dedicate migration network separate from production traffic
- Compression: Enable data compression during transfer (often 30-50% reduction)
- Parallel transfers: Split data into segments transferred simultaneously
- Physical transfer: Use Azure Data Box for datasets >20 TB
- Incremental synchronization: Initial bulk transfer + delta synchronization closer to cutover
Data Consistency During Migration¶
The Challenge:
Applications continue to write data during migration. Ensuring consistency between source (Azure) and target (Azure Local) systems while data is actively changing requires careful coordination.
Consistency Models:
| Model | Description | Use Case | Downtime |
|---|---|---|---|
| Snapshot-based | Take point-in-time snapshot, transfer, restore | Low-change databases, acceptable data loss | Minimal (read-only during final sync) |
| Replication-based | Continuous replication to target | High-change databases, minimal data loss tolerance | Minimal (seconds during cutover) |
| Dual-write | Application writes to both source and target | Zero data loss requirement | None (during transition period) |
| Quiesce-and-transfer | Stop writes, transfer, resume | Batch systems, scheduled maintenance | Hours (duration of transfer) |
Choosing the Right Model:
- Mission-critical databases with high transaction volume: Replication-based
- Data warehouses and analytics databases: Snapshot-based
- E-commerce and financial systems: Dual-write or replication-based
- Development/test environments: Quiesce-and-transfer
Schema and Format Compatibility¶
The Challenge:
Azure PaaS services sometimes use proprietary formats, extensions, or features not available in open-source alternatives. Cosmos DB's multi-model capabilities, Azure SQL's temporal tables, and Azure Storage's hierarchical namespaces may require transformation during migration.
Assessment Questions:
- Does the target system support all features used in the source?
- Are there proprietary extensions requiring code changes?
- Can data be exported in a standard format (SQL, JSON, CSV, Parquet)?
- Are character encodings, date formats, and collations compatible?
Mitigation:
- Schema comparison tools (e.g., SQL Server Data Tools schema compare)
- Feature parity analysis before migration
- Transformation scripts for incompatible features
- Comprehensive testing with production-like data
Encryption and Key Management¶
The Challenge:
Azure-managed encryption keys (Azure Key Vault, Storage Account keys) are not accessible after disconnection. Data must be re-encrypted with locally managed keys, or encryption keys must be exported before disconnection.
Key Migration Strategies:
# Export Azure Key Vault keys before disconnection
az keyvault key backup --vault-name my-keyvault --name database-encryption-key --file dek-backup.key
# Import into HashiCorp Vault or local key management system
vault write transit/keys/database-encryption-key type=aes256-gcm96
Encryption Transition Patterns:
- Re-encryption: Decrypt with Azure keys during export, re-encrypt with local keys during import
- Key export: Export Azure keys before disconnection, import into local key management
- Transparent transition: Use application-level encryption independent of storage layer
Migration Strategies by Data Type¶
Relational Databases: Azure SQL → SQL Server / PostgreSQL¶
Online Migration with Minimal Downtime¶
Approach: Use transactional replication or log shipping to continuously synchronize data from Azure SQL to target system.
For SQL Server:
-- Step 1: Configure Azure SQL as publisher
EXEC sp_addlinkedserver
@server = 'AZURE_LOCAL_SQL',
@srvproduct = '',
@provider = 'SQLNCLI',
@datasrc = 'sql-mi.azurelocal.local';
-- Step 2: Set up transactional replication
EXEC sp_replicationdboption
@dbname = 'ProductionDB',
@optname = 'publish',
@value = 'true';
EXEC sp_addpublication
@publication = 'ProductionDB_Publication',
@sync_method = 'concurrent',
@repl_freq = 'continuous',
@status = 'active';
-- Step 3: Add articles (tables) to publication
EXEC sp_addarticle
@publication = 'ProductionDB_Publication',
@article = 'Orders',
@source_object = 'Orders',
@destination_table = 'Orders';
-- Repeat for all tables
-- Step 4: Create subscription on target
EXEC sp_addsubscription
@publication = 'ProductionDB_Publication',
@subscriber = 'AZURE_LOCAL_SQL',
@destination_db = 'ProductionDB',
@subscription_type = 'push';
Cutover Process:
-- Monitor replication lag
SELECT
subscriber_db,
DATEDIFF(SECOND, last_sync_timestamp, GETDATE()) AS ReplicationLagSeconds
FROM distribution.dbo.MSdistribution_status;
-- When lag is <5 seconds:
-- 1. Set application to read-only mode
-- 2. Wait for replication to catch up completely
-- 3. Update connection strings to point to Azure Local
-- 4. Resume write operations
-- 5. Disable replication
Alternative: Database Migration Service (DMS)
# Create DMS project
az dms project create \
--resource-group migration-rg \
--service-name my-dms-service \
--name db-migration-project \
--source-platform SQL \
--target-platform SQLDB \
--location eastus
# Create migration task
az dms project task create \
--resource-group migration-rg \
--service-name my-dms-service \
--project-name db-migration-project \
--name migration-task-01 \
--source-connection-json @source-connection.json \
--target-connection-json @target-connection.json \
--database-options-json @database-options.json \
--task-type OnlineMigration
Offline Migration with Backup/Restore¶
Approach: Export database as backup file, transfer to Azure Local, restore.
For Large Databases (Faster Restore):
-- On Azure SQL: Export database
-- Use Azure Data Studio or SSMS to generate BACPAC file
-- Or use native backup (if available on your tier)
-- Alternative: Use bcp for individual tables
bcp "SELECT * FROM Orders" queryout "Orders.dat" -S server.database.windows.net -d ProductionDB -U username -P password -n
-- Transfer files to Azure Local (network copy or Data Box)
-- On Azure Local SQL: Restore database
RESTORE DATABASE ProductionDB
FROM DISK = '\\nas.azurelocal.local\backups\ProductionDB.bak'
WITH MOVE 'ProductionDB' TO 'C:\SQLData\ProductionDB.mdf',
MOVE 'ProductionDB_log' TO 'C:\SQLData\ProductionDB_log.ldf',
REPLACE,
STATS = 10;
-- Verify restore
SELECT name, state_desc FROM sys.databases WHERE name = 'ProductionDB';
For PostgreSQL Migration:
# Export from Azure Database for PostgreSQL
pg_dump -h myserver.postgres.database.azure.com \
-U myuser@myserver \
-d productiondb \
-Fc -b -v -f productiondb.backup
# Transfer to Azure Local
scp productiondb.backup azurelocal:/var/lib/postgresql/backups/
# Restore on Azure Local PostgreSQL
pg_restore -h localhost \
-U postgres \
-d productiondb \
-v productiondb.backup
Blob/Object Storage: Azure Blob → MinIO / Local Storage¶
Online Migration with AzCopy¶
Approach: Use AzCopy to synchronize blobs from Azure Storage to target object storage (MinIO, Ceph) or file shares.
Step 1: Install AzCopy and Configure Authentication
# Download AzCopy
wget https://aka.ms/downloadazcopy-v10-linux
tar -xvf downloadazcopy-v10-linux
sudo mv azcopy_linux_amd64_*/azcopy /usr/local/bin/
# Authenticate to Azure Storage
export AZCOPY_AUTO_LOGIN_TYPE=SPN
export AZCOPY_SPA_APPLICATION_ID=<app-id>
export AZCOPY_SPA_CLIENT_SECRET=<secret>
export AZCOPY_TENANT_ID=<tenant-id>
Step 2: Bulk Copy with Validation
# Initial bulk copy
azcopy copy \
"https://mystorage.blob.core.windows.net/production-data/*?<SAS-token>" \
"http://minio.azurelocal.local:9000/production-data/" \
--recursive=true \
--check-length=true \
--s2s-preserve-blob-tags=true \
--log-level=INFO
# Incremental sync (copy only changed/new files)
azcopy sync \
"https://mystorage.blob.core.windows.net/production-data/?<SAS-token>" \
"http://minio.azurelocal.local:9000/production-data/" \
--recursive=true \
--delete-destination=false \
--compare-hash=MD5
Step 3: Verify Transfer Completeness
# Count files in source
az storage blob list --account-name mystorage --container-name production-data --query "length([*])"
# Count files in destination (MinIO via mc command-line client)
mc ls --recursive minio/production-data/ | wc -l
# Compare file counts and spot-check file integrity
Step 4: Application Configuration Update
// Before: Azure Blob Storage
BlobServiceClient blobServiceClient = new BlobServiceClient(
"DefaultEndpointsProtocol=https;AccountName=mystorage;AccountKey=<key>;EndpointSuffix=core.windows.net");
// After: MinIO S3-compatible storage
var s3Client = new AmazonS3Client(
new BasicAWSCredentials("minioadmin", "minioadmin"),
new AmazonS3Config
{
ServiceURL = "http://minio.azurelocal.local:9000",
ForcePathStyle = true
});
Offline Migration with Azure Data Box¶
When to Use:
- Data volume >20 TB
- Network bandwidth insufficient for acceptable transfer time
- Regulatory requirements for offline data transfer
Process:
# Step 1: Order Azure Data Box from Azure Portal
# Microsoft ships empty Data Box device to your location
# Step 2: Connect Data Box to local network and copy data
# Use Data Box Web UI or SMB/NFS shares
# Copy to Data Box
robocopy \\mystorage.blob.core.windows.net\production-data\ E:\production-data\ /E /Z /MT:32
# Step 3: Ship Data Box back to Microsoft
# Microsoft uploads data to specified Azure Storage account
# Step 4: Download from Azure Storage to Azure Local
# Use AzCopy as shown above, but now from Data Box target storage
# Alternative: Use Data Box for direct on-premises delivery
# (Requires Data Box Edge or Data Box Gateway appliance)
Queue/Message Data: Azure Service Bus → RabbitMQ / NATS¶
Drain-and-Replay Pattern¶
Approach: Stop message producers, allow consumers to drain queues, replay messages to new queue system during cutover.
Step 1: Deploy Target Message Broker
# Deploy RabbitMQ on Kubernetes
helm install rabbitmq bitnami/rabbitmq \
--set auth.username=admin \
--set auth.password=<secure-password> \
--set replicaCount=3 \
--set persistence.size=20Gi \
--namespace messaging
# Create queues and exchanges matching Azure Service Bus topology
kubectl exec -it rabbitmq-0 -n messaging -- rabbitmqadmin declare queue name=order-processing durable=true
kubectl exec -it rabbitmq-0 -n messaging -- rabbitmqadmin declare exchange name=order-events type=topic durable=true
Step 2: Dual-Write Pattern During Transition
// Temporarily write to both Azure Service Bus and RabbitMQ
public async Task PublishOrderEvent(OrderEvent orderEvent)
{
// Publish to Azure Service Bus
await serviceBusSender.SendMessageAsync(new ServiceBusMessage(JsonSerializer.Serialize(orderEvent)));
// Also publish to RabbitMQ
var body = Encoding.UTF8.GetBytes(JsonSerializer.Serialize(orderEvent));
channel.BasicPublish(exchange: "order-events", routingKey: "order.created", body: body);
}
// Update consumers to read from RabbitMQ
// Validate processing works correctly
// Remove Azure Service Bus publishers after validation period
Step 3: Message Archive and Replay
// If messages must be preserved and replayed:
// 1. Export messages from Azure Service Bus
ServiceBusReceiver receiver = serviceBusClient.CreateReceiver(queueName);
List<ServiceBusReceivedMessage> messages = new List<ServiceBusReceivedMessage>();
await foreach (ServiceBusReceivedMessage message in receiver.ReceiveMessagesAsync(maxMessages: 100))
{
messages.Add(message);
await receiver.CompleteMessageAsync(message);
}
// 2. Store messages to durable storage
File.WriteAllText("message-archive.json", JsonSerializer.Serialize(messages));
// 3. Replay to RabbitMQ
foreach (var message in messages)
{
var body = Encoding.UTF8.GetBytes(message.Body.ToString());
channel.BasicPublish(exchange: "", routingKey: queueName, body: body);
}
File Shares: Azure Files → SMB/NFS Shares¶
Robocopy Migration for Windows File Shares¶
Approach: Use Robocopy to mirror Azure Files to on-premises file server with incremental synchronization.
# Mount Azure Files as network drive
$storageAccountKey = ConvertTo-SecureString -String "<storage-key>" -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential -ArgumentList "Azure\mystorageaccount", $storageAccountKey
New-PSDrive -Name Z -PSProvider FileSystem -Root "\\mystorageaccount.file.core.windows.net\share" -Credential $credential -Persist
# Initial bulk copy with Robocopy
robocopy Z:\ \\azurelocal-fs\share\ /E /Z /MT:32 /R:3 /W:10 /LOG:C:\Migration\robocopy-log.txt
# Incremental sync (run multiple times as cutover approaches)
robocopy Z:\ \\azurelocal-fs\share\ /E /Z /MT:32 /R:3 /W:10 /XO /LOG+:C:\Migration\robocopy-incremental.txt
# Final sync during cutover window
robocopy Z:\ \\azurelocal-fs\share\ /E /Z /MT:32 /R:3 /W:10 /LOG+:C:\Migration\robocopy-final.txt
Verify File Integrity:
# Compare file counts
(Get-ChildItem -Path Z:\ -Recurse -File).Count
(Get-ChildItem -Path \\azurelocal-fs\share\ -Recurse -File).Count
# Spot-check file hashes
Get-FileHash Z:\ImportantFile.pdf -Algorithm SHA256
Get-FileHash \\azurelocal-fs\share\ImportantFile.pdf -Algorithm SHA256
rsync Migration for Linux NFS Shares¶
# Mount Azure Files as NFS (if enabled)
sudo mount -t nfs mystorageaccount.file.core.windows.net:/mystorageaccount/share /mnt/azure-files
# Initial bulk copy
rsync -avz --progress --stats /mnt/azure-files/ /mnt/azurelocal-nfs/
# Incremental sync
rsync -avz --progress --stats --delete /mnt/azure-files/ /mnt/azurelocal-nfs/
# Verify with checksums
find /mnt/azure-files/ -type f -exec md5sum {} \; | sort > /tmp/source-checksums.txt
find /mnt/azurelocal-nfs/ -type f -exec md5sum {} \; | sort > /tmp/target-checksums.txt
diff /tmp/source-checksums.txt /tmp/target-checksums.txt
Data Validation and Integrity Verification¶
Database Validation Strategies¶
Row Count Comparison¶
-- Generate row counts for all tables
SELECT
t.name AS TableName,
SUM(p.rows) AS RowCount
FROM sys.tables t
INNER JOIN sys.partitions p ON t.object_id = p.object_id
WHERE p.index_id IN (0,1)
GROUP BY t.name
ORDER BY t.name;
-- Run on both source and target, compare results
-- Difference indicates incomplete migration or ongoing writes
Checksum Validation¶
-- SQL Server: CHECKSUM_AGG for entire table
SELECT 'Orders' AS TableName, CHECKSUM_AGG(CHECKSUM(*)) AS Checksum
FROM Orders
UNION ALL
SELECT 'Customers', CHECKSUM_AGG(CHECKSUM(*)) FROM Customers
UNION ALL
SELECT 'Products', CHECKSUM_AGG(CHECKSUM(*)) FROM Products;
-- Run on both source and target
-- Matching checksums indicate identical data
-- Caveat: Sensitive to row order; ensure consistent ORDER BY if needed
Application-Level Validation¶
// Critical data validation queries
public async Task<ValidationReport> ValidateMigration()
{
var report = new ValidationReport();
// Verify critical business metrics match
report.SourceOrderCount = await sourceDb.Orders.CountAsync();
report.TargetOrderCount = await targetDb.Orders.CountAsync();
report.SourceRevenue = await sourceDb.Orders.SumAsync(o => o.TotalAmount);
report.TargetRevenue = await targetDb.Orders.SumAsync(o => o.TotalAmount);
// Verify recent data transferred
var recentDate = DateTime.UtcNow.AddDays(-7);
report.SourceRecentOrders = await sourceDb.Orders.Where(o => o.OrderDate >= recentDate).CountAsync();
report.TargetRecentOrders = await targetDb.Orders.Where(o => o.OrderDate >= recentDate).CountAsync();
// Verify referential integrity
var orphanedOrders = await targetDb.Orders
.Where(o => !targetDb.Customers.Any(c => c.CustomerId == o.CustomerId))
.CountAsync();
report.OrphanedRecords = orphanedOrders;
return report;
}
Blob/Object Storage Validation¶
File Count and Size Comparison¶
# Azure Blob Storage inventory
az storage blob list \
--account-name mystorage \
--container-name production-data \
--query "[].{name:name, size:properties.contentLength}" \
--output json > azure-inventory.json
# Count and total size
jq '[.[] | .size] | add' azure-inventory.json # Total bytes
jq 'length' azure-inventory.json # File count
# MinIO inventory
mc ls --recursive --json minio/production-data/ > minio-inventory.json
# Compare counts and sizes
# Build custom script to compare manifests
Checksum Verification¶
import hashlib
import boto3
def verify_blob_migration(azure_container, minio_bucket, sample_size=100):
"""
Verify blob migration by comparing MD5 hashes of sample files.
"""
# Get random sample of files from Azure
azure_blobs = list_azure_blobs(azure_container)
sample_blobs = random.sample(azure_blobs, min(sample_size, len(azure_blobs)))
mismatches = []
for blob_name in sample_blobs:
# Download from Azure and compute hash
azure_content = download_azure_blob(azure_container, blob_name)
azure_hash = hashlib.md5(azure_content).hexdigest()
# Download from MinIO and compute hash
minio_content = download_minio_object(minio_bucket, blob_name)
minio_hash = hashlib.md5(minio_content).hexdigest()
if azure_hash != minio_hash:
mismatches.append({
'blob': blob_name,
'azure_md5': azure_hash,
'minio_md5': minio_hash
})
return mismatches
Message Queue Validation¶
Validation Approach:
- Monitor dead-letter queues for message processing failures
- Compare message counts (if messages are persisted)
- End-to-end transaction testing: send test messages through entire pipeline
# RabbitMQ queue depth
kubectl exec -it rabbitmq-0 -n messaging -- rabbitmqctl list_queues name messages
# Verify no messages in dead-letter queue
kubectl exec -it rabbitmq-0 -n messaging -- rabbitmqctl list_queues name messages | grep dlq
Minimal Downtime Strategies¶
Blue/Green Deployment Pattern¶
Approach: Run both environments in parallel, switch traffic atomically at cutover.
Architecture:
Phase 1: Blue (Azure) handles 100% traffic
Phase 2: Green (Azure Local) built and validated with shadow traffic
Phase 3: Traffic switched to Green; Blue remains warm for rollback
Phase 4: Blue decommissioned after validation period
DNS Cutover:
# Before: app.company.com → Azure (20.10.5.123)
# After: app.company.com → Azure Local (192.168.10.100)
# Lower TTL 24 hours before cutover
az network dns record-set a update \
--resource-group dns-rg \
--zone-name company.com \
--name app \
--set ttl=60
# Cutover: Update A record
az network dns record-set a update \
--resource-group dns-rg \
--zone-name company.com \
--name app \
--set ipv4Addresses="192.168.10.100"
# Rollback if needed (revert to Azure IP)
az network dns record-set a update \
--resource-group dns-rg \
--zone-name company.com \
--name app \
--set ipv4Addresses="20.10.5.123"
Canary Deployment Pattern¶
Approach: Gradually shift traffic from Azure to Azure Local in controlled increments.
Implementation with Traffic Manager:
# Configure Traffic Manager with weighted routing
# Initially: Azure 100%, Azure Local 0%
# Increment 1: Shift 10% to Azure Local
az network traffic-manager endpoint update \
--name azure-endpoint \
--profile-name migration-profile \
--resource-group migration-rg \
--type azureEndpoints \
--weight 90
az network traffic-manager endpoint update \
--name azurelocal-endpoint \
--profile-name migration-profile \
--resource-group migration-rg \
--type externalEndpoints \
--weight 10
# Monitor error rates, latency, and user feedback
# If stable, increase to 25%, then 50%, then 100%
Database Replication with Read Replica¶
Approach: Establish read replica on Azure Local, promote to primary during cutover.
-- Set up log shipping (SQL Server)
-- On Azure SQL: Configure transaction log backups to accessible location
BACKUP LOG ProductionDB
TO DISK = '\\shared-storage\logs\ProductionDB_log.trn'
WITH NOFORMAT, NOINIT, COMPRESSION;
-- On Azure Local SQL: Restore log backups continuously
RESTORE LOG ProductionDB
FROM DISK = '\\shared-storage\logs\ProductionDB_log.trn'
WITH NORECOVERY;
-- Cutover:
-- 1. Stop application writes
-- 2. Take final log backup and restore
-- 3. Recover database (make it writable)
RESTORE DATABASE ProductionDB WITH RECOVERY;
-- 4. Update application connection strings
-- 5. Resume writes
Data Sovereignty Considerations During Migration¶
Compliance Requirements¶
Questions to Address:
- Geographic restrictions: Can data legally transit through specific regions?
- Encryption in transit: Are there requirements for specific encryption standards?
- Chain of custody: Must data transfer be audited and logged?
- Data residency: Can data temporarily reside in Azure after primary operations move to Azure Local?
Mitigation Strategies:
- Use ExpressRoute with Microsoft peering (private connectivity) instead of public internet
- Implement end-to-end encryption independent of transport layer
- Maintain detailed audit logs of all data movement
- Obtain legal sign-off on migration plan before execution
Regulatory Approval¶
Process:
- Document data flow diagrams showing source, transit paths, and destination
- Identify all data classifications involved (PII, PHI, financial, etc.)
- Obtain compliance team review and approval
- Coordinate with legal counsel for cross-border data movement
- Notify regulatory bodies if required by jurisdiction
Audit Trail¶
# Log all AzCopy operations
azcopy copy \
"source" \
"destination" \
--log-level=INFO \
--output-type=json \
> migration-log-$(date +%Y%m%d-%H%M%S).json
# Capture cryptographic hashes of all migrated data
find /mnt/azurelocal-data/ -type f -exec sha256sum {} \; > migration-hashes.txt
# Store audit trail in immutable storage
# Retention period per compliance requirements (typically 7 years)
graph TB
subgraph Azure["☁️ Azure Cloud"]
direction TB
AzSQL[(Azure SQL<br/>Database)]
Blob[(Azure Blob<br/>Storage)]
SBus[Azure Service<br/>Bus Queues]
Files[(Azure<br/>Files)]
end
subgraph Migration["📦 Migration Layer"]
direction TB
DMS[Database Migration<br/>Service]
AzCopy[AzCopy /<br/>Storage Explorer]
QueueDrain[Queue Drain<br/>Scripts]
Robocopy[Robocopy /<br/>rsync]
subgraph Validation["✓ Validation Checkpoints"]
V1[Schema Validation]
V2[Checksum Verification]
V3[Count Validation]
V4[File Integrity Check]
end
end
subgraph OnPrem["🏢 Azure Local / On-Premises"]
direction TB
SQLServer[(SQL Server<br/>on VMs)]
MinIO[(MinIO Object<br/>Storage)]
RabbitMQ[RabbitMQ<br/>Queues]
SMB[(SMB File<br/>Shares)]
end
AzSQL -->|Online Replication| DMS
DMS -->|Initial Sync| V1
V1 -->|Validated| SQLServer
AzSQL -.->|Continuous Sync| SQLServer
Blob -->|Parallel Copy| AzCopy
AzCopy -->|Batch Transfer| V2
V2 -->|Hash Match| MinIO
SBus -->|Message Replay| QueueDrain
QueueDrain -->|Count Check| V3
V3 -->|Verified| RabbitMQ
Files -->|Incremental Sync| Robocopy
Robocopy -->|File Check| V4
V4 -->|Complete| SMB
SQLServer -.->|Final Cutover| Cutover([Production<br/>Cutover])
MinIO -.->|Ready| Cutover
RabbitMQ -.->|Ready| Cutover
SMB -.->|Ready| Cutover
style Azure fill:#0078d4,stroke:#005a9e,stroke-width:2px,color:#fff
style OnPrem fill:#107c10,stroke:#004b1c,stroke-width:2px,color:#fff
style Migration fill:#50e6ff,stroke:#0078d4,stroke-width:2px
style Validation fill:#ffc107,stroke:#f57c00,stroke-width:2px
style Cutover fill:#dc3545,stroke:#a71d2a,stroke-width:3px,color:#fff Data Migration Tools and Automation¶
Azure Database Migration Service¶
Capabilities: - Online and offline database migrations - Support for SQL Server, PostgreSQL, MySQL, MongoDB - Minimal downtime for mission-critical databases - Automated schema and data migration
Workflow:
# Create DMS instance
az dms create \
--resource-group migration-rg \
--name my-dms-service \
--location eastus \
--sku-name Premium_4vCores \
--subnet /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Network/virtualNetworks/<vnet>/subnets/<subnet>
# Create migration project and task (as shown earlier)
AzCopy for Blob Transfer¶
Best Practices:
- Use SAS tokens with minimal required permissions
- Enable logging for audit trail
- Implement retry logic for transient failures
- Use
--check-lengthflag to validate file sizes - Consider
--cap-mbpsto avoid saturating network links
Custom Migration Scripts¶
Example: Parallel Table Migration
import pyodbc
import concurrent.futures
def migrate_table(table_name, source_conn_str, target_conn_str):
"""Migrate a single table from source to target."""
source_conn = pyodbc.connect(source_conn_str)
target_conn = pyodbc.connect(target_conn_str)
# Export from source
cursor = source_conn.cursor()
cursor.execute(f"SELECT * FROM {table_name}")
rows = cursor.fetchall()
# Import to target
columns = [column[0] for column in cursor.description]
placeholders = ','.join(['?' for _ in columns])
insert_sql = f"INSERT INTO {table_name} ({','.join(columns)}) VALUES ({placeholders})"
target_cursor = target_conn.cursor()
target_cursor.executemany(insert_sql, rows)
target_conn.commit()
print(f"Migrated {len(rows)} rows from {table_name}")
# Parallel execution
tables = ['Orders', 'Customers', 'Products', 'Inventory']
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(migrate_table, table, source_conn, target_conn) for table in tables]
concurrent.futures.wait(futures)
Post-Migration Data Management¶
Data Retention and Archival¶
Strategy:
- Define retention policies for Azure data after migration
- Archive historical data that won't be migrated (e.g., data older than 7 years)
- Maintain access to Azure Storage for compliance lookback periods
- Plan decommissioning timeline for Azure resources
Ongoing Synchronization (Hybrid Period)¶
If maintaining hybrid operation for extended period:
# Schedule incremental sync job (cron on Linux)
0 2 * * * /usr/local/bin/azcopy sync "https://source/*" "/target/" --delete-destination=false
# Monitor sync status
tail -f /var/log/azcopy-sync.log
Data migration is not a single event but a carefully orchestrated process requiring technical precision, business coordination, and unwavering focus on data integrity. The strategies outlined in this chapter provide a framework for moving data safely across the hybrid continuum, but every organization must adapt these patterns to their specific requirements, risk tolerance, and operational constraints.
References¶
Next: Operational Continuity →