Home Serverless services Tutorials

Deploy SageMaker Canvas Models with Serverless Inference

Deploy SageMaker Canvas Models with Serverless Inference
Built a machine learning model in SageMaker Canvas? Deploying it doesn’t require ML/DevOps expertise. This guide shows how to deploy Canvas models using SageMaker Serverless Inference—serving predictions without managing servers or paying for idle time.

Why Serverless Inference?

Amazon SageMaker Canvas lets you create ML models without code using existing data sources. SageMaker Serverless Inference completes the journey by automatically provisioning infrastructure based on demand—you pay only for inference requests, not idle capacity.

Feature Traditional Serverless
Infrastructure Manual setup Automatic
Pricing 24/7 instances Per-request only
Best For Consistent traffic Variable/intermittent

Step 1: Export to Model Registry

  1. Open SageMaker AI console and launch SageMaker Studio
  2. Launch SageMaker Canvas (opens in new tab)
  3. Locate your model and click options menu (three dots)
  4. Select Add to Model Registry
Cost Tip: After exporting, configure Canvas to auto-shutdown when idle to prevent workspace charges.

Step 2: Approve and Get Deployment Details

  1. In SageMaker Studio, choose Models
  2. Find your model (status: Pending manual approval)
  3. Update status to Approved
  4. Navigate to Deploy tab and record:
    • Container image URI (ECR)
    • Model data location (S3)
    • Environment variables
Critical Environment Variables Example:

SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT: text/csv
SAGEMAKER_INFERENCE_OUTPUT: predicted_label
SAGEMAKER_PROGRAM: tabular_serve.py
SAGEMAKER_SUBMIT_DIRECTORY: /opt/ml/model/code

Copy all variables exactly—missing any will cause inference failures.

Step 3: Create Model and Deploy

Create Model:

  1. Open SageMaker console → InferenceModels
  2. Click Create model
  3. Add container image URI, S3 model location, and all environment variables

Create Serverless Endpoint:

  1. Choose Endpoint configurationsCreate
  2. Set type to Serverless
  3. Configure memory (1-6 GB) and max concurrency (1-200)
  4. Go to EndpointsCreate endpoint
  5. Select your configuration and deploy (takes 3-5 minutes)
Memory Guide: Start with 2 GB for standard Canvas models. First request after idle periods will experience cold start latency (10-30 seconds).

Step 4: Invoke Your Endpoint

Python Example:

import boto3
from io import StringIO
import csv

def invoke_model(features):
    client = boto3.client('sagemaker-runtime')
    
    output = StringIO()
    csv.writer(output).writerow(features)
    
    response = client.invoke_endpoint(
        EndpointName='your-endpoint-name',
        ContentType='text/csv',
        Accept='text/csv',
        Body=output.getvalue()
    )
    
    result = list(csv.reader(
        StringIO(response['Body'].read().decode())
    ))[0]
    
    return {
        'predicted_label': result[0],
        'confidence': float(result[1])
    }

# Example usage
features = ["Bell", "Base", 14, 6, 11, 11, 
            "GlobalFreight", "Bulk Order", 
            "Atlanta", "2020-09-11", "Express", 109.25]

prediction = invoke_model(features)
print(f"Prediction: {prediction['predicted_label']}")
print(f"Confidence: {prediction['confidence']*100:.1f}%")

Automated Deployment (Optional)

For production workflows, automate endpoint creation using EventBridge and Lambda. When you approve a model in the registry, EventBridge triggers a Lambda function that automatically:

  • Extracts model details (container, S3 location, environment variables)
  • Creates the model configuration
  • Deploys a serverless endpoint with your specified memory and concurrency settings

Deploy the provided CloudFormation template to enable this automation. Configure parameters for memory size (1024-6144 MB), max concurrency (1-200), and authorized SageMaker Studio domain ID for security.

Security Note: The automation uses SSM Parameter Store to validate that only approved SageMaker domains can trigger deployments.

Key Takeaways

Serverless Inference eliminates infrastructure management for Canvas models, making deployment accessible to teams without DevOps expertise. You pay only for inference requests, making it cost-effective for variable workloads. The deployment process—export to registry, approve, deploy, and invoke—takes under 10 minutes manually, or can be fully automated with EventBridge and Lambda.

For workloads with consistent traffic, traditional real-time endpoints may be more cost-effective. For everything else, serverless provides the simplest path from Canvas model to production predictions.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here