GraphAr Permissions Schema Specification¶
Version: 1.0 Date: 2025-10-21 Status: Draft Purpose: Define GraphAr-compliant storage format for Zanzibar-style permissions
Overview¶
This specification defines how Zanzibar-style permission relation tuples are stored as a GraphAr-compliant graph structure. The permissions graph represents access control relationships as edges between subject and object vertices.
Key Concepts¶
- Subjects: Entities that can have permissions (users, groups, service accounts)
- Objects: Resources that can be accessed (boards, lists, tasks, documents)
- Relations: Permission types (owner, editor, viewer, etc.)
- Tuples: Edges representing "subject has relation to object"
Directory Structure¶
permissions_graph/
├── _metadata.yaml # Graph-level metadata
├── _schema.yaml # Vertex and edge schemas
├── vertices/ # Subject and object vertices
│ ├── user/ # User subjects
│ │ └── part0.parquet
│ ├── group/ # Group subjects (optional)
│ │ └── part0.parquet
│ ├── board/ # Board objects
│ │ └── part0.parquet
│ ├── list/ # List objects
│ │ └── part0.parquet
│ └── task/ # Task objects
│ └── part0.parquet
└── edges/ # Permission relationships
├── owner/ # Owner relations
│ └── part0.parquet
├── editor/ # Editor relations
│ └── part0.parquet
└── viewer/ # Viewer relations
└── part0.parquet
Metadata File Specification¶
_metadata.yaml¶
name: "permissions"
version: "1.0"
directed: true
description: "Zanzibar-style permission graph with relation-based access control"
creator: "ParquetFrame Permissions System"
created_at: "2025-10-21T00:00:00Z"
Fields:
name(required): Graph identifier, always "permissions"version(required): Schema version (semver)directed(required): Alwaystruefor permissionsdescription(optional): Human-readable descriptioncreator(optional): System that created the graphcreated_at(optional): ISO 8601 timestamp
Schema File Specification¶
_schema.yaml¶
version: "1.0"
vertices:
# Subject vertex types
user:
properties:
id:
type: "string"
primary: true
description: "Unique user identifier"
created_at:
type: "timestamp"
nullable: true
group:
properties:
id:
type: "string"
primary: true
description: "Unique group identifier"
created_at:
type: "timestamp"
nullable: true
# Object vertex types
board:
properties:
id:
type: "string"
primary: true
description: "Unique board identifier"
list:
properties:
id:
type: "string"
primary: true
description: "Unique list identifier"
task:
properties:
id:
type: "string"
primary: true
description: "Unique task identifier"
edges:
# Permission relation types
owner:
properties:
src:
type: "string"
source: true
description: "Subject ID (user or group)"
dst:
type: "string"
target: true
description: "Object ID (board, list, or task)"
subject_namespace:
type: "string"
description: "Type of subject (user, group)"
object_namespace:
type: "string"
description: "Type of object (board, list, task)"
created_at:
type: "timestamp"
nullable: true
description: "When permission was granted"
granted_by:
type: "string"
nullable: true
description: "User ID who granted the permission"
editor:
properties:
src:
type: "string"
source: true
description: "Subject ID (user or group)"
dst:
type: "string"
target: true
description: "Object ID (board, list, or task)"
subject_namespace:
type: "string"
description: "Type of subject (user, group)"
object_namespace:
type: "string"
description: "Type of object (board, list, task)"
created_at:
type: "timestamp"
nullable: true
granted_by:
type: "string"
nullable: true
viewer:
properties:
src:
type: "string"
source: true
description: "Subject ID (user or group)"
dst:
type: "string"
target: true
description: "Object ID (board, list, or task)"
subject_namespace:
type: "string"
description: "Type of subject (user, group)"
object_namespace:
type: "string"
description: "Type of object (board, list, task)"
created_at:
type: "timestamp"
nullable: true
granted_by:
type: "string"
nullable: true
Data Type Mapping¶
Parquet Schema¶
| YAML Type | Parquet Type | Notes |
|---|---|---|
string |
STRING |
UTF-8 encoded |
int32 |
INT32 |
Signed 32-bit integer |
int64 |
INT64 |
Signed 64-bit integer |
float64 |
DOUBLE |
IEEE 754 double precision |
boolean |
BOOLEAN |
True/false |
timestamp |
TIMESTAMP_MILLIS |
Milliseconds since epoch |
Permission Tuple Representation¶
Zanzibar Tuple Format¶
RelationTuple(
namespace="board",
object_id="board_123",
relation="owner",
subject_namespace="user",
subject_id="alice"
)
GraphAr Representation¶
As an edge in edges/owner/part0.parquet:
| src | dst | subject_namespace | object_namespace | created_at | granted_by |
|---|---|---|---|---|---|
| alice | board_123 | user | board | 2025-10-21T10:00:00Z | admin |
Vertex entries:
vertices/user/part0.parquet:{id: "alice", created_at: ...}vertices/board/part0.parquet:{id: "board_123"}
Relation Types¶
Standard Relations¶
| Relation | Description | Inherits From |
|---|---|---|
owner |
Full control, can delete | editor, viewer |
editor |
Can modify content | viewer |
viewer |
Read-only access | - |
Custom Relations (Extensible)¶
Applications can define custom relations by adding edge types:
Two-Graph Architecture¶
Application Data Graph (app_graph/)¶
app_graph/
├── _metadata.yaml
├── _schema.yaml
├── vertices/
│ ├── User/
│ ├── Board/
│ ├── TaskList/
│ └── Task/
└── edges/
├── owns/ # Ownership relationships
├── contains/ # Containment relationships
└── assigned_to/ # Assignment relationships
Purpose: Business domain data and entity relationships
Permissions Graph (permissions_graph/)¶
permissions_graph/
├── _metadata.yaml
├── _schema.yaml
├── vertices/ # Subjects and objects
│ ├── user/
│ ├── board/
│ ├── list/
│ └── task/
└── edges/ # Permission tuples
├── owner/
├── editor/
└── viewer/
Purpose: Access control and authorization
Separation Rationale¶
- Concern Separation: Business logic vs authorization
- Independent Evolution: Schema changes don't affect each other
- Performance: Optimize queries for different access patterns
- Security: Different backup/replication strategies
- Compliance: Easier to audit and regulate permissions
Usage Examples¶
Creating Permission Tuples¶
from parquetframe.permissions import TupleStore, RelationTuple
# Create tuple store
store = TupleStore()
# Add permission tuples
store.add_tuple(RelationTuple(
namespace="board",
object_id="board_123",
relation="owner",
subject_namespace="user",
subject_id="alice"
))
# Save as GraphAr structure
store.save("./permissions_graph")
Result:
- Creates
permissions_graph/_metadata.yaml - Creates
permissions_graph/_schema.yaml - Saves vertices to
permissions_graph/vertices/ - Saves edges to
permissions_graph/edges/owner/part0.parquet
Loading Permission Graph¶
import parquetframe as pf
# Load with GraphAr reader
perm_graph = pf.read_graph("./permissions_graph")
# Query permissions using graph algorithms
# Find all users who can access a board
board_viewers = perm_graph.edges.query("dst == 'board_123'")
Checking Permissions¶
from parquetframe.permissions import check
# Check if user has permission
has_access = check(
store=store,
subject_namespace="user",
subject_id="alice",
relation="owner",
object_namespace="board",
object_id="board_123"
)
Migration from Legacy Format¶
Legacy Format (Single File)¶
Migration Script¶
from pathlib import Path
import pandas as pd
from parquetframe.permissions import TupleStore, RelationTuple
# Read legacy format
legacy_df = pd.read_parquet("permissions.parquet")
# Create new tuple store
store = TupleStore()
# Convert to relation tuples
for _, row in legacy_df.iterrows():
store.add_tuple(RelationTuple(
namespace=row["namespace"],
object_id=row["object_id"],
relation=row["relation"],
subject_namespace=row["subject_namespace"],
subject_id=row["subject_id"]
))
# Save as GraphAr structure
store.save("./permissions_graph")
Validation Rules¶
Structure Validation¶
- Required Files:
_metadata.yamlmust exist_schema.yamlmust existvertices/directory must exist-
edges/directory must exist -
Vertex Requirements:
- Each vertex type must have an
idproperty marked asprimary: true -
Vertex IDs must be unique within their namespace
-
Edge Requirements:
- Edges must have
srcanddstproperties srcanddstmust reference valid vertex IDssubject_namespaceandobject_namespacemust match vertex types
Data Validation¶
from parquetframe.permissions.validation import validate_permissions_graph
# Validate GraphAr structure
is_valid = validate_permissions_graph("./permissions_graph")
Checks:
- All subject vertices exist
- All object vertices exist
- No dangling edges
- Schema compliance
- Data type correctness
Performance Considerations¶
Partitioning Strategy¶
For large permission datasets, partition by relation type:
edges/
├── owner/
│ ├── part0.parquet # First 1M edges
│ ├── part1.parquet # Next 1M edges
│ └── ...
├── editor/
│ ├── part0.parquet
│ └── ...
└── viewer/
├── part0.parquet
└── ...
Indexing¶
Use Parquet row groups for efficient filtering:
Compression¶
Recommended compression: SNAPPY or ZSTD
Compatibility¶
Apache GraphAr Compliance¶
This specification is fully compatible with Apache GraphAr v1.0.
Compliance checklist:
- ✅ Directory structure follows GraphAr spec
- ✅ Metadata YAML format compliant
- ✅ Schema YAML format compliant
- ✅ Parquet file format compliant
- ✅ Vertex/edge separation compliant
Tool Support¶
Compatible with:
- ParquetFrame graph reader (
pf.read_graph()) - Apache Arrow tools
- DuckDB for SQL queries
- NetworkX for graph analysis (via conversion)
- Graph visualization tools
References¶
- Apache GraphAr Specification
- Zanzibar: Google's Consistent, Global Authorization System
- Apache Parquet Documentation
Changelog¶
Version 1.0 (2025-10-21)¶
- Initial specification
- Define core vertex and edge schemas
- Document two-graph architecture
- Provide migration guide from legacy format
Status: Ready for implementation
Next Steps: Implement TupleStore.save() and TupleStore.load() with GraphAr format