Backend Selection (Phase 1 - Legacy)¶

Deprecated API

This documentation describes the Phase 1 API backend selection. Phase 2 offers a more advanced multi-engine architecture with automatic selection between pandas, Polars, and Dask. See Phase 2 Multi-Engine Core for details.

Migration Guide: See Phase 1 → Phase 2 Migration

Choose the right processing backend for your use case.

Available Backends¶

ParquetFrame Phase 1 supports multiple processing backends to handle different scale requirements.

Pandas Backend¶

Best for: - Small to medium datasets (< 1GB) - Interactive analysis - Fast single-machine processing

Dask Backend¶

Best for: - Large datasets (> 1GB) - Distributed processing - Memory-constrained environments

Automatic Selection¶

ParquetFrame can automatically choose the optimal backend based on: - File size - Available memory - System resources

Summary¶

Choosing the right backend ensures optimal performance for your specific use case and data size.

Examples¶

import parquetframe as pf

# Automatic backend selection (recommended)
df = pf.read("data.parquet")

# Force pandas backend
df = pf.read("data.parquet", backend="pandas")

# Force dask backend
df = pf.read("data.parquet", backend="dask")

# Check which backend is being used
print(f"Using backend: {df.backend}")