Accessing and unifying public datasets often feels overwhelming, even for seasoned data professionals. Google’s new Python client library for Data Commons offers a streamlined path to discover insights from a vast knowledge graph encompassing domains like health, economy, demographics, and more.
Why Data Commons Stands Out
Data Commons aggregates and organizes publicly available statistics from hundreds of reputable sources. By harmonizing data across diverse fields, it delivers a unified resource for answering complex questions and powering robust analyses. Even Google Search utilizes Data Commons for real-time statistical queries, making this tool a cornerstone for data-driven decision-making.
The Next-Generation Python Client
The latest Python client library, built on the V2 REST API, represents a major leap forward in usability and capability. Developed in collaboration with The ONE Campaign, it is finely tuned for the needs of researchers and data analysts, seamlessly integrating with the broader Python ecosystem.
Custom Data Commons Instances: Tailored Data Environments
One of the most significant advancements is the robust support for custom Data Commons instances. Organizations like the UN and ONE can now host their own versions, merging proprietary datasets with public data and maintaining full control.
The client enables secure access to both public and private instances, whether on-premises or in the cloud, unlocking unprecedented flexibility for organizations and researchers.
Key Features at a Glance
- Pandas integration: Work directly with Data Commons datasets in familiar dataframes for intuitive analysis and visualization.
- Convenience methods: Execute common queries, map entities, and explore data relationships with concise, readable code.
- API key management: Effortless authentication and stateful operations for streamlined, secure access.
- Pydantic support: Leverage advanced data validation and serialization for more reliable workflows.
- Multiple response formats: Choose between JSON, Python dictionaries, or lists to best fit your project requirements.
- Extensive dataset access: Tap into over 200,000 variables from more than 200 integrated datasets.
For instance, you can swiftly visualize global poverty trends or analyze demographic shifts across continents with just a few lines of Python. The new APIs dramatically simplify the process of querying and visualizing large-scale datasets.
variable = "sdg/SI_POV_DAY1"
variable_name = "Proportion of population below international poverty line"
df = client.observations_dataframe(variable_dcids=variable, date="all", parent_entity="Earth", entity_type="Continent")
df = df.pivot(index="date", columns="entity_name", values="value")
ax = df.plot(kind="line")
ax.set_xlabel("Year")
ax.set_ylabel("%")
ax.set_title(variable_name)
ax.legend()
ax.plot()
Getting Started Is Easy
To begin, simply install the library from PyPI and dive into the detailed reference documentation and hands-on tutorials, including Colab notebooks.
Google encourages users of the previous V1 API to upgrade, as support will shift to the new, feature-rich V2 version.
Building a Collaborative Open Data Community
The Python client library is open source and available on GitHub, inviting developers and researchers worldwide to contribute enhancements. By fostering an inclusive, collaborative environment, Data Commons is helping to democratize access to critical data and fuel impactful research.
Empowering Insightful Data Exploration
The new Python Data Commons client library offers powerful tools for data access, analysis, and visualization. Whether you’re a developer, analyst, or researcher, this release makes unlocking insights from global datasets faster and more accessible, underscoring the transformative role of open data in today’s world.
Unlocking the Power of Open Data: Exploring the New Python Data Commons Client