Entity Framework API Reference¶
Complete API documentation for ParquetFrame's Entity Framework (Phase 2).
Decorators¶
@entity¶
Decorator that transforms a dataclass into a persistent entity with automatic CRUD operations.
Parameters:
storage_path(str): Directory path where entity data will be storedprimary_key(str): Name of the field to use as primary keyformat(str, optional): Storage format - "parquet", "avro", or "csv". Default: "parquet"engine(str, optional): Processing engine - "pandas", "polars", or "dask". Default: "pandas"
Returns:
Decorated class with added methods:
- save(): Persist entity to storage
- delete(): Remove entity from storage
- find(pk_value): Load entity by primary key
- find_all(): Load all entities
- find_by(**filters): Query entities with filters
- count(): Count total entities
Example:
```python path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=19 @entity(storage_path="./kanban_data/users", primary_key="user_id") @dataclass class User: user_id: str username: str email: str created_at: datetime = None
Decorator that defines a relationship between entities.
Parameters:
target_entity(str): Name of the related entity classforeign_key(str): Name of the foreign key fieldreverse(bool, optional): Whether this is a reverse relationship (one-to-many). Default: False
Returns:
Method decorator that enables relationship navigation.
Forward Relationship Example:
```python path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=190 @rel("TaskList", foreign_key="list_id") def list(self): """Get the list this task belongs to.""" pass
**Reverse Relationship Example:**
```python path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=90
@rel("TaskList", foreign_key="board_id", reverse=True)
def lists(self):
"""Get all task lists in this board."""
pass
Entity Methods¶
save()¶
Persists the entity to storage. Creates new record if entity doesn't exist, updates if it does.
Example:
```python path=null start=null user = User(user_id="u001", username="alice", email="alice@example.com") user.save()
Removes the entity from storage.
Example:
```python path=null start=null user = User.find("u001") user.delete()
Loads an entity by its primary key value.
Parameters:
pk_value: Value of the primary key to search for
Returns:
Entity instance if found, None otherwise.
Example:
```python path=null start=null user = User.find("u001") if user: print(f"Found: {user.username}")
Loads all entities from storage.
Returns:
List of all entity instances.
Example:
```python path=null start=null all_users = User.find_all() for user in all_users: print(user.username)
Queries entities with field filters.
Parameters:
**filters: Keyword arguments where keys are field names and values are filter values
Returns:
List of matching entity instances.
Example:
```python path=null start=null alice_users = User.find_by(username="alice") active_tasks = Task.find_by(status="in_progress", priority="high")
Returns the total number of entities in storage.
Returns:
Integer count of entities.
Example:
```python path=null start=null total_users = User.count() print(f"Total users: {total_users}")
Removes all entities from storage. Use with caution!
Example:
```python path=null start=null
Clear all test data¶
TestUser.delete_all()
## Relationship Navigation
### Forward Relationships (Many-to-One)
Navigate from child to parent entity.
**Example:**
```python path=null start=null
task = Task.find("task_001")
task_list = task.list() # Get parent TaskList
board = task_list.board() # Get parent Board
owner = board.owner() # Get User who owns the board
Reverse Relationships (One-to-Many)¶
Navigate from parent to child entities.
Example:
```python path=null start=null board = Board.find("board_001") lists = board.lists() # Get all TaskLists in board
for lst in lists: tasks = lst.tasks() # Get all Tasks in each list print(f"{lst.name}: {len(tasks)} tasks")
## Storage Formats
### Parquet (Default)
```python path=null start=null
@entity(storage_path="./data/users", primary_key="user_id", format="parquet")
@dataclass
class User:
user_id: str
name: str
Advantages: - Columnar format, efficient for analytics - Fast read/write performance - Excellent compression - Wide ecosystem support
Avro¶
```python path=null start=null @entity(storage_path="./data/users", primary_key="user_id", format="avro") @dataclass class User: user_id: str name: str
**Advantages:**
- Schema evolution support
- Compact binary format
- Fast serialization
- Cross-language compatibility
### CSV
```python path=null start=null
@entity(storage_path="./data/users", primary_key="user_id", format="csv")
@dataclass
class User:
user_id: str
name: str
Advantages: - Human-readable - Universal compatibility - Easy debugging - Simple to inspect
Engine Selection¶
Pandas (Default)¶
```python path=null start=null @entity(storage_path="./data/users", primary_key="user_id", engine="pandas")
**Best for:**
- Small to medium datasets (< 1GB)
- Interactive analysis
- Compatibility with existing pandas code
### Polars
```python path=null start=null
@entity(storage_path="./data/users", primary_key="user_id", engine="polars")
Best for: - Medium to large datasets (1GB - 100GB) - Lazy evaluation - Faster query performance
Dask¶
```python path=null start=null @entity(storage_path="./data/users", primary_key="user_id", engine="dask")
**Best for:**
- Very large datasets (> 100GB)
- Distributed processing
- Out-of-core computation
## Best Practices
### 1. Use Descriptive Storage Paths
```python path=null start=null
# Good: organized by entity type
@entity(storage_path="./data/entities/users", primary_key="user_id")
# Avoid: generic paths
@entity(storage_path="./data", primary_key="user_id")
2. Choose Appropriate Primary Keys¶
```python path=null start=null
Good: UUID or unique identifier¶
@entity(storage_path="./data/users", primary_key="user_id") class User: user_id: str # UUID
Avoid: non-unique fields¶
@entity(storage_path="./data/users", primary_key="username") # usernames can change
### 3. Add Timestamps
```python path=null start=null
@entity(storage_path="./data/users", primary_key="user_id")
@dataclass
class User:
user_id: str
username: str
created_at: datetime = None
updated_at: datetime = None
def __post_init__(self):
now = datetime.now()
if self.created_at is None:
self.created_at = now
if self.updated_at is None:
self.updated_at = now
4. Validate Data in post_init¶
```python path=/Users/temp/Documents/Projects/parquetframe/examples/integration/todo_kanban/models.py start=174 def post_init(self): now = datetime.now() if self.created_at is None: self.created_at = now if self.updated_at is None: self.updated_at = now
# Validate status
if self.status not in ["todo", "in_progress", "done"]:
raise ValueError(f"Invalid status: {self.status}")
# Validate priority
if self.priority not in ["low", "medium", "high"]:
raise ValueError(f"Invalid priority: {self.priority}")
```
See Also¶
- Entity Framework Guide - Complete entity framework tutorial
- Todo/Kanban Example - Full application example
- Examples Gallery - More entity framework examples