Why Serverless Inference?
Amazon SageMaker Canvas lets you create ML models without code using existing data sources. SageMaker Serverless Inference completes the journey by automatically provisioning infrastructure based on demand—you pay only for inference requests, not idle capacity.
| Feature | Traditional | Serverless |
|---|---|---|
| Infrastructure | Manual setup | Automatic |
| Pricing | 24/7 instances | Per-request only |
| Best For | Consistent traffic | Variable/intermittent |
Step 1: Export to Model Registry
- Open SageMaker AI console and launch SageMaker Studio
- Launch SageMaker Canvas (opens in new tab)
- Locate your model and click options menu (three dots)
- Select Add to Model Registry
Step 2: Approve and Get Deployment Details
- In SageMaker Studio, choose Models
- Find your model (status: Pending manual approval)
- Update status to Approved
- Navigate to Deploy tab and record:
- Container image URI (ECR)
- Model data location (S3)
- Environment variables
SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT: text/csv
SAGEMAKER_INFERENCE_OUTPUT: predicted_label
SAGEMAKER_PROGRAM: tabular_serve.py
SAGEMAKER_SUBMIT_DIRECTORY: /opt/ml/model/code
Copy all variables exactly—missing any will cause inference failures.
Step 3: Create Model and Deploy
Create Model:
- Open SageMaker console → Inference → Models
- Click Create model
- Add container image URI, S3 model location, and all environment variables
Create Serverless Endpoint:
- Choose Endpoint configurations → Create
- Set type to Serverless
- Configure memory (1-6 GB) and max concurrency (1-200)
- Go to Endpoints → Create endpoint
- Select your configuration and deploy (takes 3-5 minutes)
Step 4: Invoke Your Endpoint
import boto3
from io import StringIO
import csv
def invoke_model(features):
client = boto3.client('sagemaker-runtime')
output = StringIO()
csv.writer(output).writerow(features)
response = client.invoke_endpoint(
EndpointName='your-endpoint-name',
ContentType='text/csv',
Accept='text/csv',
Body=output.getvalue()
)
result = list(csv.reader(
StringIO(response['Body'].read().decode())
))[0]
return {
'predicted_label': result[0],
'confidence': float(result[1])
}
# Example usage
features = ["Bell", "Base", 14, 6, 11, 11,
"GlobalFreight", "Bulk Order",
"Atlanta", "2020-09-11", "Express", 109.25]
prediction = invoke_model(features)
print(f"Prediction: {prediction['predicted_label']}")
print(f"Confidence: {prediction['confidence']*100:.1f}%")
Automated Deployment (Optional)
For production workflows, automate endpoint creation using EventBridge and Lambda. When you approve a model in the registry, EventBridge triggers a Lambda function that automatically:
- Extracts model details (container, S3 location, environment variables)
- Creates the model configuration
- Deploys a serverless endpoint with your specified memory and concurrency settings
Deploy the provided CloudFormation template to enable this automation. Configure parameters for memory size (1024-6144 MB), max concurrency (1-200), and authorized SageMaker Studio domain ID for security.
Key Takeaways
Serverless Inference eliminates infrastructure management for Canvas models, making deployment accessible to teams without DevOps expertise. You pay only for inference requests, making it cost-effective for variable workloads. The deployment process—export to registry, approve, deploy, and invoke—takes under 10 minutes manually, or can be fully automated with EventBridge and Lambda.
For workloads with consistent traffic, traditional real-time endpoints may be more cost-effective. For everything else, serverless provides the simplest path from Canvas model to production predictions.




