API Reference
DataFrame
Bases: DataFrame
, Generic[S]
A generic Polars DataFrame with schema validation.
This class extends polars.DataFrame
to support schema validation using
Python's type annotations and metadata. It ensures that the DataFrame
conforms to a specified schema, enforcing constraints such as sorting,
uniqueness, and custom validation checks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The input DataFrame to be validated. |
required |
Type Parameters
S : TypedDict
The schema definition as a TypedDict
, where fields can have metadata
such as sorting, uniqueness, coercion, and validation checks.
Methods:
Name | Description |
---|---|
validate |
Validates the DataFrame against the expected schema. |
Example
from typing import Annotated, TypedDict
from polaroids import DataFrame, Field
from polaroids.types import int32
import polars as pl
class BasicSchema(TypedDict):
a: Annotated[
int32,
Field(
sorted="ascending",
coerce=True,
unique=True,
checks=[lambda d: d.ge(0)], # Ensures values are non-negative
),
]
b: int | None # Optional integer column
df = pl.DataFrame({"a": [0.0, 1.0], "b": [None, 0]})
validated_df = DataFrame[BasicSchema](df).validate()
The validate()
method ensures that:
- The schema of df
matches the TypedDict (with possible coercion).
- Column a
is sorted in ascending order.
- Column a
only contains non-negative values.
- Column a
has unique values.
- Column b
allows None
values.
Raises:
Type | Description |
---|---|
ValidationError
|
If the DataFrame does not conform to the expected schema. |
__getattribute__(name)
Dynamically delegate attribute access to the underlying polars.DataFrame
.
This method intercepts attribute lookups that are not found on DataFrame
and attempts to retrieve them from the polars.DataFrame
superclass, the restult is converted back into an instance
of this DataFrame
subclass.
We intercept only on subset of polars.DataFrame methods, we intercept only methods that might not change the schema.
Source code in src/polaroids/dataframe.py
validate()
Validate the dataframe based on the annotations of the TypedDict.
This function performs various validation checks, including:
- Schema equality: Ensures that the DataFrame matches the expected schema.
- Primary key uniqueness: Verifies that primary key columns contain unique values.
- Unique values: Checks for unique constraints on specific columns.
- Nullable columns: Ensures that required columns do not contain null values.
- Sortedness: Validates whether specified columns are sorted in the expected order.
- Custom checks: Applies user-defined validation functions.
Returns:
Type | Description |
---|---|
Self: The validated DataFrame.
|
|
Raises:
Type | Description |
---|---|
ValidationError: If any validation check fails.
|
|
Source code in src/polaroids/dataframe.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
Field
Bases: TypedDict
TypedDict representing the configuration for a field in a schema.
Attributes:
Name | Type | Description |
---|---|---|
primary_key |
bool
|
Indicates whether the field is a primary key. |
unique |
bool
|
Indicates whether the field values must be unique. |
sorted |
{descending, ascending}
|
Specifies the sorting order for the field. |
coerce |
bool
|
Indicates whether to coerce the field values to the specified type. |
default |
Expr
|
The default value for the field. |
checks |
list[Callable[[Expr], Expr]]
|
A list of validation checks for the field. |