ADR-0003: Rust-First Integration Strategy¶

Status: Proposed

Date: 2025-01-20

Deciders: ParquetFrame Core Team

Technical Story: Integrating Rust as a performance-critical backend while maintaining Python ergonomics and 100% backward compatibility

Context and Problem Statement¶

ParquetFrame has successfully implemented Phase 2 with multi-engine support (pandas, Polars, Dask), achieving 2-5x performance improvements. However, certain operations remain CPU-bound and memory-intensive, particularly:

Graph Operations: Adjacency structure building (CSR/CSC), BFS/DFS traversals, PageRank iterations on graphs with 10M+ edges
I/O Operations: Parquet/Avro metadata parsing, predicate pushdown preparation, columnar filtering
Transform Kernels: Boolean masking, type-safe arithmetic, reductions across large Arrow buffers
Workflow Orchestration: DAG execution, concurrency management, resource scheduling

While Python engines provide good performance, they hit fundamental limitations: - GIL Constraints: Python's Global Interpreter Lock limits true parallelism - Memory Overhead: Python objects have significant memory overhead (24-56 bytes per object) - Type Safety: Runtime type checking adds overhead and potential for errors - Performance Ceiling: Even with optimized libraries, Python can't match systems languages

Users working with large-scale data (100M+ rows, 10GB+ files, complex graph traversals) need: - 5-20x speedups on graph algorithms - 2-5x faster I/O metadata operations - 3-10x improvements on columnar transforms - 30-60% lower peak memory usage

Decision Drivers¶

Performance Requirements¶

Large Dataset Support: Efficient handling of 1B+ edge graphs, 100GB+ Parquet files
Memory Efficiency: Reduce memory overhead for graph structures and large dataframes
Parallel Scalability: Utilize multi-core processors effectively without GIL constraints
Predictable Performance: Consistent, deterministic performance characteristics

Technical Requirements¶

Backward Compatibility: 100% API compatibility with existing Phase 2 code
Graceful Degradation: Automatic fallback to Python when Rust unavailable
Zero-Copy Interop: Efficient data exchange via Arrow/NumPy without serialization overhead
Production Ready: Enterprise-grade error handling, logging, and monitoring

Development Experience¶

Modern Toolchain: Leverage Rust's type system, borrow checker, and cargo ecosystem
PyO3 Integration: Seamless Python bindings with automatic type conversion
Incremental Adoption: Phase-by-phase migration, not big-bang rewrite
Developer Friendly: Clear documentation, examples, and onboarding guides

Project Standards¶

Conventional Commits: All commits follow conventional commit format
Git Best Practices: Feature branch workflow with descriptive commits
Test Coverage: Maintain ≥85% coverage for Rust components
Documentation: Comprehensive API docs, tutorials, and migration guides

Decision¶

We will integrate Rust as a 4^th engine alongside pandas, Polars, and Dask, with Rust serving as the preferred backend for performance-critical operations while maintaining full Python fallback.

Implementation Approach¶

1. Rust-First with Transparent Fallback

# Default behavior - uses Rust when available, falls back to Python
import parquetframe as pf

graph = pf.GraphFrame.from_edges(edges)  # Uses Rust CSR builder
distances = graph.bfs(source=0)          # Uses Rust BFS algorithm
df = pf.read_parquet("large_file.pqt")   # Uses Rust metadata parser

# Environment variable to disable Rust globally
export PARQUETFRAME_DISABLE_RUST=1       # Falls back to Python

# Explicit backend control
graph = pf.GraphFrame.from_edges(edges, engine='rust')     # Rust only
graph = pf.GraphFrame.from_edges(edges, engine='pandas')   # Pure Python

2. Architecture: Cargo Workspace + PyO3

parquetframe/
├── crates/
│   ├── pf-graph-core/     # Graph algorithms (CSR/CSC, BFS, PageRank)
│   ├── pf-io-core/         # I/O operations (Parquet/Avro metadata)
│   ├── pf-workflow-core/   # Workflow DAG executor
│   └── pf-py/              # PyO3 Python bindings
├── src/parquetframe/
│   ├── backends/
│   │   └── rust_backend.py # Detection and fallback logic
│   └── ...
└── Cargo.toml              # Workspace configuration

3. Data Interchange: Zero-Copy via Arrow

Primary: Arrow RecordBatch / Arrays (zero-copy between Rust ↔ Python)
Secondary: NumPy arrays (single copy to Arrow, then zero-copy)
Graph Data: Typed buffers (offsets, indices, weights) via memoryview

4. Phased Rollout

Phase	Component	Version	Timeline
Phase 0	Foundation & Build Infrastructure	v1.1.0	1 week
Phase 1	Graph Core (CSR/CSC, BFS, DFS)	v1.2.0	2-3 weeks
Phase 2	I/O Fast-Paths (Metadata, Filters)	v1.3.0	2-3 weeks
Phase 3	Advanced Algorithms (PageRank, Dijkstra)	v1.4.0	2-3 weeks
Phase 4	Transform Kernels (Filters, Projections)	v1.5.0	2-3 weeks
Phase 5	Workflow Executor (DAG, Concurrency)	v1.6.0	2-3 weeks
Phase 6	Entity Persistence & Polish	v2.0.0	2-3 weeks

v2.0.0 Target: Rust-first by default, delivering 5-20x performance improvements

Concurrency Strategy¶

Rayon: Data parallelism for CPU-bound operations (configurable thread pool)
Tokio: Async/await for I/O-bound operations (optional, feature-gated)
GIL Release: Python::allow_threads for long-running Rust operations
Thread Control: Respect RAYON_NUM_THREADS and PARQUETFRAME_RUST_THREADS environment variables

Alternatives Considered¶

Alternative A: Pure Python Optimization (Rejected)¶

Approach: Focus on optimizing pure Python with NumPy, Numba, and better algorithms.

Pros: - No additional build complexity - No new languages for contributors - Easier deployment (no Rust toolchain)

Cons: - Performance Ceiling: Can't match Rust's performance (max 2-3x improvements) - GIL Limitations: True parallelism impossible - Memory Overhead: Python object overhead remains - Maintenance: Complex NumPy/Numba code harder to maintain than clear Rust

Why Rejected: Can't achieve target 5-20x performance improvements

Alternative B: Cython Implementation (Rejected)¶

Approach: Use Cython to compile performance-critical paths to C.

Pros: - Closer to Python syntax - Good NumPy integration - Proven technology (used by pandas, scikit-learn)

Cons: - Syntax Complexity: Cython's type annotation syntax is verbose and error-prone - Memory Safety: Manual memory management, no borrow checker - Tooling: Weaker tooling compared to Rust (cargo, clippy, rustfmt) - Async Support: Poor async/await support compared to Tokio - Maintenance: Cython code harder to read/maintain than Rust

Why Rejected: Rust provides better developer experience and safety guarantees

Alternative C: C++ Extensions (Rejected)¶

Approach: Write C++ extensions using pybind11.

Pros: - Maximum performance potential - Mature ecosystem (Arrow C++, Parquet C++) - Good interoperability

Cons: - Memory Safety: Manual memory management, no borrow checker - Build Complexity: CMake, compiler flags, platform dependencies - Async Support: No native async/await (requires external libraries) - Modern Features: C++20 adoption slower than Rust - Learning Curve: C++ complexity higher than Rust for contributors

Why Rejected: Rust provides memory safety without runtime overhead

Alternative D: Rust-First Integration (Selected) ✅¶

Approach: Integrate Rust via PyO3 with automatic fallback to Python.

Pros: - Performance: Achieves target 5-20x improvements - Memory Safety: Borrow checker prevents memory errors at compile time - Concurrency: Native parallelism without GIL constraints - Modern Toolchain: Cargo, clippy, rustfmt, excellent IDE support - Zero-Copy: Arrow support enables efficient data exchange - Async/Await: Tokio provides production-grade async runtime - Backward Compatibility: Transparent fallback maintains compatibility - Growing Ecosystem: Python/Rust integration increasingly common (Polars, Ruff, uv)

Cons: - Build Complexity: Requires Rust toolchain for development - Learning Curve: Contributors need to learn Rust (mitigated by gradual adoption) - Binary Distribution: Need to build wheels for multiple platforms (handled by maturin + CI)

Why Selected: Best long-term solution for performance, safety, and maintainability

Consequences¶

Positive Consequences¶

✅ Dramatic Performance Improvements: 5-20x speedups on graph operations, 2-5x on I/O

✅ Memory Efficiency: 30-60% lower peak memory usage on large workloads

✅ True Parallelism: Multi-core utilization without GIL constraints

✅ Memory Safety: Borrow checker prevents entire classes of bugs

✅ Modern Toolchain: Cargo, clippy, rustfmt provide excellent developer experience

✅ Future-Proof: Rust adoption growing in data science (Polars, uv, Ruff)

✅ Competitive Advantage: Positions ParquetFrame as high-performance alternative to pure Python libraries

Negative Consequences¶

❌ Build Complexity: Contributors need Rust toolchain installed

❌ Learning Curve: New contributors must learn Rust basics

❌ CI/CD Complexity: Need to build Rust wheels for multiple platforms

❌ Debugging: Cross-language debugging more complex

Neutral Consequences¶

⚪ Binary Wheels: Increased wheel size (~5-10MB per platform) - acceptable tradeoff

⚪ Development Time: Initial Rust implementation slower, but faster iteration long-term

⚪ Maintenance: Two languages to maintain, but Rust code more maintainable than equivalent Python

Risk Mitigation¶

Risk: Low Rust Adoption by Contributors

Mitigation: Comprehensive Rust onboarding guide with examples
Mitigation: Core team maintains Rust code, Python contributions remain primary
Mitigation: Clear separation: Rust for performance, Python for features
Mitigation: 3-6 month transition period for team to learn Rust

Risk: Build/Deployment Complexity

Mitigation: Maturin simplifies Python packaging with Rust
Mitigation: Pre-built wheels for all major platforms (manylinux, macOS, Windows)
Mitigation: Fallback to pure Python if Rust wheels unavailable
Mitigation: CI/CD automation for wheel building (GitHub Actions)

Risk: Performance Regression

Mitigation: Comprehensive benchmark suite comparing Rust vs Python
Mitigation: CI fails on >10% performance regression
Mitigation: Property-based testing (hypothesis + quickcheck)
Mitigation: Golden-result tests ensure correctness

Risk: Cross-Language Debugging Difficulty

Mitigation: Extensive logging at Python/Rust boundary
Mitigation: Environment variable for verbose Rust logging (PARQUETFRAME_RUST_LOG=debug)
Mitigation: Clear error messages with context
Mitigation: Separate Rust unit tests from Python integration tests

Implementation Checklist¶

Phase 0: Foundation (v1.1.0) - 1 week¶

Create ADR documenting Rust integration decision
Setup Cargo workspace with 4 crates (graph-core, io-core, workflow-core, py)
Configure maturin for PyO3 Python bindings
Implement rust_backend.py detection with fallback logic
Update .gitignore for Rust artifacts
Add Rust toolchain configuration (.rust-toolchain.toml)
Update CI/CD for Rust checks and wheel builds
Add benchmarking framework
Documentation: README section, rust-integration.md guide

Phase 1: Graph Core (v1.2.0) - 2-3 weeks¶

Implement CSR/CSC adjacency builders in Rust
Implement BFS/DFS traversal algorithms
PyO3 bindings with automatic backend selection
Integration with existing GraphFrame API
Comprehensive parity tests vs pandas/networkx
Benchmarks showing 5-10x improvements

Phase 2-6: Progressive Rollout (v1.3.0-2.0.0) - 12-15 weeks¶

I/O fast-paths (Parquet/Avro metadata)
Advanced graph algorithms (PageRank, Dijkstra, Connected Components)
Transform kernels (filters, projections, groupby)
Workflow DAG executor with concurrency
Entity persistence optimizations
Complete documentation and migration guide

This ADR builds on: - ADR-0001: Next-Generation Architecture (Phase 2 multi-engine framework) - ADR-0002: Make Phase 2 Default API (v1.0.0 multi-engine release)

This ADR informs: - Future ADRs about specific Rust component implementations - Version 2.0.0 release planning (Rust-first by default)

References¶

CONTEXT_RUSTIC.md (removed broken link) - Detailed Rust integration roadmap
PyO3 Documentation - Python-Rust bindings
Maturin Documentation - Building Python wheels with Rust
Apache Arrow Rust - Arrow implementation in Rust
Rayon Documentation - Data parallelism in Rust
Polars - Reference: Successful Python library with Rust backend

Success Metrics¶

This decision will be considered successful when:

Performance: Achieve 5-20x speedups on graph operations (measured in benchmarks)
Memory: Reduce peak memory usage by 30-60% on large datasets
Compatibility: 100% backward compatibility maintained (all existing tests pass)
Adoption: ≥50% of operations use Rust backend when wheels installed
Quality: Zero correctness regressions (Rust produces identical results to Python)
Developer Experience: Rust contribution guide has ≥80% positive feedback
Deployment: Pre-built wheels available for all major platforms (Linux, macOS, Windows, x86_64, arm64)
Coverage: Rust code maintains ≥85% test coverage

Timeline¶

v1.1.0 (Week 1): Phase 0 Foundation - Build infrastructure complete
v1.2.0 (Week 4): Phase 1 Graph Core - CSR/CSC + BFS/DFS complete
v1.3.0 (Week 7): Phase 2 I/O - Parquet/Avro metadata fast-paths
v1.4.0 (Week 10): Phase 3 Algorithms - PageRank, Dijkstra, Components
v1.5.0 (Week 13): Phase 4 Transforms - Filters, projections, groupby
v1.6.0 (Week 16): Phase 5 Workflows - DAG executor with concurrency
v2.0.0 (Week 19): Phase 6 Complete - Rust-first by default

Total Estimated Time: 14-19 weeks (~3.5-5 months)