Skip to Content

ObjectRef Brings Mixed Data Type Management to BigQuery Unlocking Unified Analytics

The Challenge of Unstructured Data

Get All The Latest Research & News!

Thanks for registering!

Modern enterprises are generating vast amounts of unstructured data, think images, audio, and documents, alongside traditional structured datasets. Historically, managing these diverse formats has meant juggling fragmented pipelines, leading to inefficiencies and slowing AI adoption. Google Cloud’s introduction of the ObjectRef data type in BigQuery looks to solve this challenge by bridging the gap between structured and unstructured data workflows.

What Makes ObjectRef Different?

With ObjectRef, organizations can reference any object stored in Cloud Storage directly within BigQuery tables. This includes both the URI and rich metadata, enabling structured and unstructured data to coexist, be processed, and even joined in the same row. 

The result? A unified, governed, and scalable platform for modern analytics and machine learning workloads.

Key Capabilities of ObjectRef

  • Multimodal Data Handling: Easily combine structured, semi-structured, and unstructured data in a single BigQuery table. Build ELT pipelines that span all data types.

  • Seamless SQL and Python Support: Use your language of choice whether SQL or Python DataFrames for transformations, aggregations, and filtering, all without compatibility headaches.

  • Serverless AI and ML Processing: Harness BigQuery’s serverless infrastructure to run LLMs, ML models, or open-source Python libraries directly on data references, instantly scaling as needed.

  • Unified Governance: Apply consistent access controls, data masking, and delegated permissions across all data types. Say goodbye to siloed governance models.

How ObjectRef Works

The heart of ObjectRef is a STRUCT containing metadata about the referenced Cloud Storage object. When you create an Object Table in BigQuery, a ref column is generated, providing direct references to assets like images or audio files. This design makes it possible to join unstructured and structured data, for example, analyzing customer interactions across audio and text in one query.

struct {
    uri string,
    authorizer string,
    version string,
    details json {   
gcs_metadata json
        }
    }

For advanced use cases, ObjectRefs can be nested in arrays, enabling tasks such as segmenting audio for call analysis. Built-in support for zero-copy snapshots and clones further enables reproducible analytics and efficient downstream machine learning workflows.

Python Integration for Custom Workflows

BigQuery’s support for Python UDFs allows data teams to bring open-source libraries into their analytics workflows. With the OBJ.GET_ACCESS_URL function, users get secure, signed URLs for interacting with unstructured data, no direct Cloud Storage permissions required.

This makes it easy to perform operations like audio denoising or extracting file properties entirely within BigQuery’s governed environment.

AI and Machine Learning: Powering Multimodal Workloads

ObjectRef unlocks powerful new scenarios in BigQuery ML and generative AI functions. Highlights include:

  • Passing multiple ObjectRefs to Gemini for advanced multimodal inference, such as comparing original and processed audio files.

  • Transcribing audio or performing speaker identification using AI prompts that reference ObjectRefs.

  • Generating embeddings for images and audio, supporting retrieval-augmented generation (RAG) and other next-gen AI workflows.

Both SQL and Python users benefit, making advanced analytics accessible to a broader range of data professionals.

Scaling Up with Enhanced Object Tables

  • Object Tables can now store over 300 million objects per table.

  • The ref column provides ready-to-use ObjectRefs for analytics tasks.

  • BigQuery DataFrames now support mixed-modality data, making it easier to wrangle and visualize complex datasets.

  • Server-side transformers enable real-time chunking, image processing, and transcription within BigQuery itself.

Getting Started with ObjectRef

Interested users can try ObjectRef in preview today. Google Cloud offers live demos, tutorials, and step-by-step guides for creating Object Tables and integrating Cloud Storage. These new features help organizations eliminate data silos and fully leverage their data for advanced analytics and AI-powered applications.

Takeaway

ObjectRef represents a major advance for cloud-native data analytics, enabling seamless integration and governance of structured and unstructured data. With BigQuery’s unified platform, building intelligent, multimodal applications has never been easier.

Source: Google Cloud Blog

ObjectRef Brings Mixed Data Type Management to BigQuery Unlocking Unified Analytics
Joshua Berkowitz August 5, 2025
Share this post