Relationship Management¶

ParquetFrame's Entity Framework uses a relationship log (edge list) to track connections between entities. This document describes the schema, operations, and best practices.

Relationship Log Schema¶

All relationships are stored in a Parquet-based edge list with the following schema:

Core Schema¶

Column	Type	Description	Required
`source_id`	string	ID of the source entity	✓
`target_id`	string	ID of the target entity	✓
`relation_type`	string	Type of relationship (e.g., "OWNS", "MEMBER_OF")	✓
`timestamp`	datetime64[ns]	When the relationship was created	✓
`metadata`	string (JSON)	Additional relationship properties	✗

Example Data¶

import pandas as pd

relationships = pd.DataFrame({
    "source_id": ["user_123", "user_123", "user_456"],
    "target_id": ["order_789", "group_abc", "order_101"],
    "relation_type": ["PURCHASED", "MEMBER_OF", "PURCHASED"],
    "timestamp": pd.to_datetime([
        "2024-01-15 10:30:00",
        "2024-01-16 14:20:00",
        "2024-01-17 09:15:00"
    ]),
    "metadata": [
        '{"amount": 99.99, "currency": "USD"}',
        '{"role": "admin"}',
        '{"amount": 149.50, "currency": "USD"}'
    ]
})

Creating Relationships¶

Basic Relationship¶

from parquetframe.entity import Entity, add_relationship

# Define entities
@Entity(name="User")
class User:
    id: str
    name: str

@Entity(name="Order")
class Order:
    id: str
    total: float

# Create relationship
user = User(id="user_123", name="Alice")
order = Order(id="order_789", total=99.99)

add_relationship(
    source=user,
    target=order,
    relation_type="PURCHASED",
    metadata={"payment_method": "credit_card"}
)

Bulk Relationship Creation¶

from parquetframe.entity import bulk_add_relationships

relationships = [
    {"source_id": "user_123", "target_id": "order_789", "relation_type": "PURCHASED"},
    {"source_id": "user_123", "target_id": "order_790", "relation_type": "PURCHASED"},
    {"source_id": "user_456", "target_id": "order_791", "relation_type": "PURCHASED"},
]

bulk_add_relationships(relationships)

Querying Relationships¶

Find All Relationships¶

from parquetframe.entity import get_relationships

# Get all relationships from a source entity
purchases = get_relationships(
    source_id="user_123",
    relation_type="PURCHASED"
)

# Get all relationships to a target entity
buyers = get_relationships(
    target_id="order_789",
    relation_type="PURCHASED"
)

Filter by Time Range¶

from datetime import datetime, timedelta

# Get recent purchases
recent_date = datetime.now() - timedelta(days=30)

recent_purchases = get_relationships(
    source_id="user_123",
    relation_type="PURCHASED",
    since=recent_date
)

Complex Queries with DataFrames¶

import parquetframe as pf

# Load relationship log as DataFrame
relationships_df = pf.read("relationships.parquet")

# Complex query: Users who purchased in the last week
from datetime import datetime, timedelta

last_week = datetime.now() - timedelta(days=7)

recent_buyers = (
    relationships_df
    .query("relation_type == 'PURCHASED' and timestamp > @last_week")
    .groupby("source_id")
    .agg({"target_id": "count", "timestamp": "max"})
    .rename(columns={"target_id": "purchase_count", "timestamp": "last_purchase"})
)

Relationship Types¶

Standard Relationship Types¶

ParquetFrame follows a naming convention for relationship types:

Pattern	Example	Description
`VERB`	`OWNS`, `MANAGES`	Action-based relationships
`NOUN_OF`	`MEMBER_OF`, `PART_OF`	Membership/composition
`HAS_NOUN`	`HAS_ADDRESS`, `HAS_ROLE`	Possession
`IS_NOUN`	`IS_ADMIN`, `IS_ACTIVE`	State/classification

Custom Relationship Types¶

# Define custom relationship types
CUSTOM_RELATIONS = {
    "REVIEWED": {"inverse": "REVIEWED_BY"},
    "FOLLOWS": {"inverse": "FOLLOWED_BY"},
    "TAGGED_WITH": {"inverse": "TAG_OF"},
}

# Use custom relationship
add_relationship(
    source=user,
    target=product,
    relation_type="REVIEWED",
    metadata={"rating": 5, "comment": "Great product!"}
)

Bidirectional Relationships¶

Some relationships have natural inverses:

from parquetframe.entity import add_bidirectional_relationship

# Automatically creates both directions
add_bidirectional_relationship(
    entity_a=user,
    entity_b=group,
    relation_type_a_to_b="MEMBER_OF",
    relation_type_b_to_a="HAS_MEMBER"
)

# This creates two entries:
# user_123 --MEMBER_OF--> group_abc
# group_abc --HAS_MEMBER--> user_123

Metadata Schema¶

The metadata column stores JSON-encoded relationship properties:

Common Metadata Patterns¶

# Weighted relationships (e.g., friendship strength)
metadata = {"weight": 0.85}

# Timestamped attributes
metadata = {
    "created_by": "user_admin",
    "approved_at": "2024-01-15T10:30:00Z"
}

# Contextual information
metadata = {
    "context": "web_app",
    "ip_address": "192.168.1.1",
    "user_agent": "Mozilla/5.0..."
}

# Business logic
metadata = {
    "status": "active",
    "expires_at": "2024-12-31T23:59:59Z",
    "auto_renew": True
}

Querying Metadata¶

import json

# Parse metadata column
relationships_df["parsed_metadata"] = relationships_df["metadata"].apply(
    lambda x: json.loads(x) if pd.notna(x) else {}
)

# Extract specific fields
relationships_df["amount"] = relationships_df["parsed_metadata"].apply(
    lambda x: x.get("amount")
)

# Filter by metadata
high_value_purchases = relationships_df[
    relationships_df["amount"] > 100
]

Storage Layout¶

Default Location¶

project_root/
└── .parquetframe/
    └── relationships/
        ├── relationships.parquet    # Main edge list
        ├── _metadata               # Parquet metadata
        └── _common_metadata        # Global metadata

Partitioned Storage¶

For large-scale applications, partition by relation type:

from parquetframe.entity import configure_relationship_storage

configure_relationship_storage(
    partition_cols=["relation_type"],
    partition_strategy="hive"
)

# Results in:
# relationships/
# ├── relation_type=PURCHASED/
# │   └── data.parquet
# ├── relation_type=MEMBER_OF/
# │   └── data.parquet
# └── relation_type=FOLLOWS/
#     └── data.parquet

Performance Optimization¶

Indexing¶

# Create indices for common queries
from parquetframe.entity import create_relationship_index

create_relationship_index(
    columns=["source_id", "relation_type"],
    name="source_relation_idx"
)

create_relationship_index(
    columns=["target_id", "relation_type"],
    name="target_relation_idx"
)

Caching¶

from parquetframe.entity import enable_relationship_cache

# Enable in-memory caching for frequently accessed relationships
enable_relationship_cache(
    max_size_mb=100,
    ttl_seconds=300  # 5 minutes
)

Graph Traversal¶

Finding Paths¶

from parquetframe.entity import find_path

# Find shortest path between entities
path = find_path(
    start_id="user_123",
    end_id="product_789",
    max_depth=3
)

# Example result:
# [
#   ("user_123", "MEMBER_OF", "group_abc"),
#   ("group_abc", "HAS_ACCESS_TO", "catalog_xyz"),
#   ("catalog_xyz", "CONTAINS", "product_789")
# ]

Neighborhood Queries¶

from parquetframe.entity import get_neighbors

# Get all entities within N hops
neighbors = get_neighbors(
    entity_id="user_123",
    relation_types=["FOLLOWS", "FRIEND_OF"],
    max_hops=2
)

Integration with Permissions¶

The relationship log integrates with ParquetFrame's Zanzibar-style permissions:

from parquetframe.permissions import check_permission

# Check if user can access resource via relationship
has_access = check_permission(
    user="user_123",
    permission="view",
    resource="document_456"
)

# Under the hood, this checks relationships like:
# user_123 --MEMBER_OF--> group_abc --CAN_VIEW--> document_456

See Permissions System for details.

Best Practices¶

1. Use Consistent Naming¶

# Good: Clear, verb-based
"PURCHASED", "OWNS", "MANAGES"

# Avoid: Vague or ambiguous
"RELATED_TO", "CONNECTED", "LINK"

2. Include Timestamps¶

Always include timestamps for audit trails and temporal queries:

add_relationship(
    source=user,
    target=order,
    relation_type="PURCHASED",
    timestamp=datetime.now()  # Explicit timestamp
)

3. Use Metadata Judiciously¶

Keep metadata lightweight; avoid storing large objects:

# Good: Small, relevant metadata
metadata = {"amount": 99.99, "currency": "USD"}

# Bad: Large, nested objects
metadata = {"entire_order_details": {...}}  # Store in separate entity instead

4. Partition Large Graphs¶

For graphs with >10M relationships, use partitioning:

# Partition by time for temporal queries
configure_relationship_storage(
    partition_cols=["timestamp"]
)

# Or by relation type for filtered queries
configure_relationship_storage(
    partition_cols=["relation_type"]
)

5. Clean Up Old Relationships¶

Implement TTL or archival for stale relationships:

from datetime import datetime, timedelta

# Archive relationships older than 1 year
cutoff = datetime.now() - timedelta(days=365)

old_relationships = relationships_df[
    relationships_df["timestamp"] < cutoff
]

old_relationships.to_parquet("relationships_archive_2023.parquet")

Examples¶

E-commerce Purchase Graph¶

from parquetframe.entity import Entity, add_relationship
import pandas as pd

# Entities
@Entity(name="User")
class User:
    id: str
    email: str

@Entity(name="Product")
class Product:
    id: str
    name: str
    price: float

@Entity(name="Order")
class Order:
    id: str
    total: float
    status: str

# Build relationship graph
user = User(id="user_123", email="alice@example.com")
product = Product(id="prod_456", name="Widget", price=29.99)
order = Order(id="order_789", total=29.99, status="completed")

# User placed order
add_relationship(user, order, "PLACED")

# Order contains product
add_relationship(order, product, "CONTAINS")

# Query: What did this user buy?
user_purchases = (
    get_relationships("user_123", "PLACED")
    .merge(
        get_relationships(relation_type="CONTAINS"),
        left_on="target_id",
        right_on="source_id"
    )
)

# Follow relationships
add_relationship(
    source=user_alice,
    target=user_bob,
    relation_type="FOLLOWS",
    metadata={"since": "2024-01-01"}
)

# Mutual follows
if check_relationship(user_bob, user_alice, "FOLLOWS"):
    print("Alice and Bob are mutual followers")

# Find followers
followers = get_relationships(
    target_id="user_alice",
    relation_type="FOLLOWS"
)