BigQuery Adds SQL-Native Inference for Open Models
Data teams traditionally faced a significant challenge when integrating AI models into analytics workflows — constantly switching between platforms, APIs, and scripting languages. Google Cloud’s BigQuery is eliminating this complexity with a groundbreaking feature: SQL-native inference for open models from Hugging Face and Vertex AI Model Garden.

This enhancement allows data analysts to create, deploy, and run AI models using standard SQL syntax — the same language they already use for querying data. The operational overhead that once slowed innovation has been dramatically reduced.

How SQL-Native Inference Works

Step 1: Deploy Your Model

Creating a managed open model in BigQuery requires a single CREATE MODEL statement. Behind the scenes, BigQuery handles deployment to a managed Vertex AI endpoint, typically completing within minutes.

For Hugging Face models, specify the hugging_face_model_id in the format provider_name/model_name. For Vertex AI Model Garden, use model_garden_model_name with the format publishers/publisher/models/model_name@version.

CREATE OR REPLACE MODEL my_dataset.embedding_model
REMOTE WITH CONNECTION DEFAULT
OPTIONS (
  hugging_face_model_id = 'sentence-transformers/all-MiniLM-L6-v2'
);

CREATE OR REPLACE MODEL my_dataset.text_model
REMOTE WITH CONNECTION DEFAULT
OPTIONS (
  model_garden_model_name = 'publishers/google/models/gemma3@gemma-3-1b-it'
);

Step 2: Run Batch Inference

Once deployed, you can immediately perform inference using specialized AI functions. BigQuery provides AI.GENERATE_EMBEDDING for text embeddings and AI.GENERATE_TEXT for language model inference. These functions accept your model and a subquery of BigQuery data, keeping your entire machine learning pipeline within SQL.

-- Generate embeddings
SELECT *
FROM AI.GENERATE_EMBEDDING(
  MODEL my_dataset.embedding_model,
  (SELECT text AS content FROM `bigquery-public-data.hacker_news.full` 
   WHERE text != '' LIMIT 10)
);

-- Generate text with LLM
SELECT *
FROM AI.GENERATE_TEXT(
  MODEL my_dataset.text_model,
  (SELECT 'Summarize: ' || text AS prompt FROM `bigquery-public-data.hacker_news.full` 
   WHERE text != '' LIMIT 10)
);

Cost Management and Lifecycle Control

Automated Resource Management

BigQuery offers the endpoint_idle_ttl option to automatically undeploy unused endpoints after a specified duration, preventing unnecessary costs. Setting this to INTERVAL 10 HOUR ensures you only pay for active resources.

Manual Deployment Control

For immediate cost control, manually undeploy or redeploy endpoints using ALTER MODEL statements:

-- Stop costs immediately
ALTER MODEL my_dataset.embedding_model SET OPTIONS(deploy_model = FALSE);

-- Resume for next job
ALTER MODEL my_dataset.embedding_model SET OPTIONS(deploy_model = TRUE);

Complete Cleanup

When finished, the DROP MODEL statement automatically cleans up all associated Vertex AI resources, including endpoints and backend infrastructure, allowing you to focus on data analysis rather than infrastructure maintenance.

DROP MODEL my_dataset.embedding_model;

Advanced Optimization

For demanding workloads, customize deployment settings during model creation by specifying machine types and replica counts. For specialized hardware needs, utilize Compute Engine GPU reservations to ensure consistent performance.

This SQL-native approach democratizes AI by eliminating platform switching, reducing operational friction, and providing granular cost control — making powerful open models accessible to every data analyst.

Follow us on Bluesky, LinkedIn, and X to Get Instant Updates