Quick Start Guide¶
Get up and running with ParquetFrame Phase 2 in minutes!
Note: This guide covers Phase 2 API with the Entity Framework. For legacy Phase 1 documentation, see Legacy Basic Usage.
Installation¶
Phase 2 Core Concepts¶
Phase 2 introduces:
- Multi-Engine Core - Seamlessly switch between pandas, Polars, and Dask
- Entity Framework - Define data models with decorators
- Relationships - Navigate object graphs with ease
- Zanzibar Permissions - Fine-grained access control
- Workflow System - YAML-based ETL pipelines
Basic Usage¶
1. Initialize Core¶
import parquetframe.core as pf2
# Initialize with default engine (pandas)
df = pf2.read()
# Or specify engine explicitly
df = pf2.read(engine="polars") # pandas, polars, or dask
2. Define Entities¶
Use the @entity decorator to define data models:
# path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=141
@entity(storage_path="./kanban_data/tasks", primary_key="task_id")
@dataclass
class Task:
"""
Task entity representing an individual task.
Fields:
task_id: Unique task identifier
title: Task title
description: Detailed task description
status: Current status (todo, in_progress, done)
priority: Task priority (low, medium, high)
list_id: ID of the list this task belongs to
assigned_to: Optional ID of the user assigned to this task
created_at: Timestamp when task was created
updated_at: Timestamp when task was last updated
Relationships:
list: Forward relationship to TaskList (parent list)
assigned_user: Forward relationship to User (assignee)
"""
task_id: str
title: str
description: str
status: str = "todo" # todo, in_progress, done
priority: str = "medium" # low, medium, high
list_id: str = ""
assigned_to: str | None = None
position: int = 0 # Position within the list
created_at: datetime = None
updated_at: datetime = None
3. Create and Save Entities¶
from datetime import datetime
# Create a task instance
task = Task(
task_id="task_001",
title="Implement user authentication",
description="Add OAuth2 support with JWT tokens",
status="todo",
priority="high",
list_id="list_001",
assigned_to="user_123"
)
# Save to storage
core.save(task)
# Load back
loaded_task = core.load(Task, task_id="task_001")
print(f"Task: {loaded_task.title} - {loaded_task.status}")
4. Define Relationships¶
Use the @rel decorator to define relationships between entities:
# path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=190
@rel("TaskList", foreign_key="list_id")
def list(self):
"""Get the list this task belongs to."""
pass
@rel("User", foreign_key="assigned_to")
def assigned_user(self):
"""Get the user assigned to this task."""
pass
Navigate relationships easily:
# Navigate from task to list to board
task = core.load(Task, task_id="task_001")
task_list = task.list() # Get parent list
board = task_list.board() # Get parent board
print(f"Task '{task.title}' is in list '{task_list.name}' on board '{board.name}'")
# Navigate reverse relationships
board = core.load(Board, board_id="board_001")
all_lists = board.lists() # Get all lists in board
for lst in all_lists:
tasks = lst.tasks() # Get all tasks in list
print(f"List '{lst.name}' has {len(tasks)} tasks")
5. Implement Permissions¶
Use Zanzibar-style permissions for fine-grained access control:
# path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/permissions.py start=110
def check_permission(
self,
user_id: str,
resource_type: str,
resource_id: str,
relation: str,
allow_indirect: bool = True,
) -> bool:
"""
Check if a user has a specific permission.
Uses Zanzibar check() API for both direct and indirect permissions.
Args:
user_id: User ID to check
resource_type: Type of resource (board, list, task)
resource_id: ID of the resource
relation: Relation/permission type to check
allow_indirect: Whether to check indirect permissions via graph traversal
Returns:
True if user has the permission, False otherwise
"""
return check(
store=self.store,
subject_namespace="user",
subject_id=user_id,
relation=relation,
object_namespace=resource_type,
object_id=resource_id,
allow_indirect=allow_indirect,
)
Usage example:
from parquetframe.permissions import PermissionManager
# Initialize permission manager
perm_mgr = PermissionManager()
# Grant board access to a user
perm_mgr.grant_board_access(user_id="user_123", board_id="board_001", role="editor")
# Check if user can edit a task
can_edit = perm_mgr.check_task_access(
user_id="user_123",
task_id="task_001",
list_id="list_001",
board_id="board_001",
required_role="editor"
)
if can_edit:
# Update the task
task.status = "in_progress"
core.save(task)
6. Define Workflows¶
Create YAML workflows for ETL pipelines:
# import_tasks.yml
name: Import Tasks from CSV
description: Load tasks from external CSV file
steps:
- name: Read CSV
action: read
params:
path: "external_tasks.csv"
format: "csv"
- name: Transform Data
action: transform
params:
operations:
- type: filter
condition: "status in ['todo', 'in_progress']"
- type: map
field: priority
mapping:
1: "low"
2: "medium"
3: "high"
- name: Save to ParquetFrame
action: save_entities
params:
entity_type: Task
storage_path: "./kanban_data/tasks"
Run workflows:
from parquetframe.workflow import WorkflowEngine
# Initialize workflow engine
engine = WorkflowEngine(core)
# Run workflow
result = engine.run_workflow("import_tasks.yml")
print(f"Imported {result['records_processed']} tasks")
Multi-Engine Support¶
Switch between compute engines based on your needs. See Engine Selection for details.
# Start with pandas for small datasets
df = pf2.read(engine="pandas")
# Switch to Polars for faster operations
core.switch_engine("polars")
# Or use Dask for distributed computing
core.switch_engine("dask")
# All entity operations work seamlessly across engines
tasks = core.query(Task).filter(status="in_progress").all()
Complete Example¶
Here's a full example bringing it all together:
import parquetframe.core as pf2
from parquetframe.permissions import PermissionManager
from datetime import datetime
# Initialize
df = pf2.read(engine="pandas")
perm_mgr = PermissionManager()
# Create entities
user = User(user_id="user_001", username="alice", email="alice@example.com")
board = Board(
board_id="board_001",
name="Q1 Roadmap",
description="Product features for Q1 2024",
owner_id="user_001"
)
task_list = TaskList(
list_id="list_001",
name="In Progress",
board_id="board_001",
position=1
)
task = Task(
task_id="task_001",
title="Design new login flow",
description="Modernize authentication UX",
status="in_progress",
priority="high",
list_id="list_001",
assigned_to="user_001"
)
# Save all entities
core.save(user)
core.save(board)
core.save(task_list)
core.save(task)
# Set up permissions
perm_mgr.grant_board_access("user_001", "board_001", "owner")
perm_mgr.inherit_board_permissions("board_001", "list_001")
perm_mgr.inherit_list_permissions("list_001", "task_001")
# Navigate relationships
loaded_task = core.load(Task, task_id="task_001")
print(f"Task: {loaded_task.title}")
print(f"List: {loaded_task.list().name}")
print(f"Board: {loaded_task.list().board().name}")
print(f"Owner: {loaded_task.list().board().owner().username}")
# Check permissions
can_edit = perm_mgr.check_task_access(
user_id="user_001",
task_id="task_001",
list_id="list_001",
board_id="board_001",
required_role="editor"
)
if can_edit:
loaded_task.status = "done"
core.save(loaded_task)
print("Task marked as done!")
Next Steps¶
Explore the Todo/Kanban Tutorial:
- Full Todo/Kanban Walkthrough - Complete application example with all Phase 2 features
Learn More:
- Entity Framework Guide - Deep dive into entities and decorators
- Relationships Guide - Master object navigation
- Permissions System - Comprehensive Zanzibar permission examples
- Workflows - YAML-based ETLFor existing users of ParquetFrame v1.x, please see the Migration Guide.
Core Concepts¶
ParquetFrame is built around a few key concepts:
- DataFrameProxy: The main entry point for data operations.
- Entities: Define schema and relationships for your data.
- Permissions: Manage access control with Zanzibar-style tuples.
API Documentation:
- Core API Reference - Complete core API documentation
- Entity API Reference - Entity framework APIs
- Permissions API Reference - Zanzibar permission APIs
Need Help?¶
- Browse examples for common use cases
- Check the Migration Guide if coming from Phase 1
- Report issues on GitHub
Happy building! 🚀 ```