This enhancement allows data analysts to create, deploy, and run AI models using standard SQL syntax — the same language they already use for querying data. The operational overhead that once slowed innovation has been dramatically reduced.
How SQL-Native Inference Works
Step 1: Deploy Your Model
Creating a managed open model in BigQuery requires a single CREATE MODEL statement. Behind the scenes, BigQuery handles deployment to a managed Vertex AI endpoint, typically completing within minutes.
For Hugging Face models, specify the hugging_face_model_id in the format provider_name/model_name. For Vertex AI Model Garden, use model_garden_model_name with the format publishers/publisher/models/model_name@version.
CREATE OR REPLACE MODEL my_dataset.embedding_model
REMOTE WITH CONNECTION DEFAULT
OPTIONS (
hugging_face_model_id = 'sentence-transformers/all-MiniLM-L6-v2'
);
CREATE OR REPLACE MODEL my_dataset.text_model
REMOTE WITH CONNECTION DEFAULT
OPTIONS (
model_garden_model_name = 'publishers/google/models/gemma3@gemma-3-1b-it'
);
Step 2: Run Batch Inference
Once deployed, you can immediately perform inference using specialized AI functions. BigQuery provides AI.GENERATE_EMBEDDING for text embeddings and AI.GENERATE_TEXT for language model inference. These functions accept your model and a subquery of BigQuery data, keeping your entire machine learning pipeline within SQL.
-- Generate embeddings
SELECT *
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.embedding_model,
(SELECT text AS content FROM `bigquery-public-data.hacker_news.full`
WHERE text != '' LIMIT 10)
);
-- Generate text with LLM
SELECT *
FROM AI.GENERATE_TEXT(
MODEL my_dataset.text_model,
(SELECT 'Summarize: ' || text AS prompt FROM `bigquery-public-data.hacker_news.full`
WHERE text != '' LIMIT 10)
);
Cost Management and Lifecycle Control
Automated Resource Management
BigQuery offers the endpoint_idle_ttl option to automatically undeploy unused endpoints after a specified duration, preventing unnecessary costs. Setting this to INTERVAL 10 HOUR ensures you only pay for active resources.
Manual Deployment Control
For immediate cost control, manually undeploy or redeploy endpoints using ALTER MODEL statements:
-- Stop costs immediately
ALTER MODEL my_dataset.embedding_model SET OPTIONS(deploy_model = FALSE);
-- Resume for next job
ALTER MODEL my_dataset.embedding_model SET OPTIONS(deploy_model = TRUE);
Complete Cleanup
When finished, the DROP MODEL statement automatically cleans up all associated Vertex AI resources, including endpoints and backend infrastructure, allowing you to focus on data analysis rather than infrastructure maintenance.
DROP MODEL my_dataset.embedding_model;
Advanced Optimization
For demanding workloads, customize deployment settings during model creation by specifying machine types and replica counts. For specialized hardware needs, utilize Compute Engine GPU reservations to ensure consistent performance.
This SQL-native approach democratizes AI by eliminating platform switching, reducing operational friction, and providing granular cost control — making powerful open models accessible to every data analyst.
Follow us on Bluesky, LinkedIn, and X to Get Instant Updates



