Skip to content

ParquetFrame

Backend Selection

leechristophermurray/parquetframe

Backend Selection¶

Choose the right processing backend for your use case.

Available Backends¶

ParquetFrame supports multiple processing backends to handle different scale requirements.

Pandas Backend¶

Best for: - Small to medium datasets (< 1GB) - Interactive analysis - Fast single-machine processing

Dask Backend¶

Best for: - Large datasets (> 1GB) - Distributed processing - Memory-constrained environments

Automatic Selection¶

ParquetFrame can automatically choose the optimal backend based on: - File size - Available memory - System resources

Summary¶

Choosing the right backend ensures optimal performance for your specific use case and data size.

Examples¶

import parquetframe as pf

# Automatic backend selection (recommended)
df = pf.read("data.parquet")

# Force pandas backend
df = pf.read("data.parquet", backend="pandas")

# Force dask backend
df = pf.read("data.parquet", backend="dask")

# Check which backend is being used
print(f"Using backend: {df.backend}")

Further Reading¶