Skip to content

ParquetFrame

Performance Tips

leechristophermurray/parquetframe

Performance Tips¶

Optimize ParquetFrame for maximum performance.

Performance Optimization¶

Getting the best performance out of ParquetFrame requires understanding when to use different approaches.

Memory Optimization¶

Use lazy loading for large files
Choose appropriate chunk sizes
Monitor memory usage during processing

Backend Selection¶

Pandas: Fast for small to medium datasets
Dask: Better for large datasets or distributed processing
Auto-selection: Let ParquetFrame choose the optimal backend

File Format Optimization¶

Column pruning to read only necessary columns
Predicate pushdown for filtering
Optimal compression settings

Summary¶

Performance optimization in ParquetFrame involves choosing the right backend, optimizing memory usage, and leveraging parquet format features.

Examples¶

import parquetframe as pf

# Lazy loading for large files
df = pf.read("huge_file.parquet", lazy=True)

# Read only specific columns
df = pf.read("data.parquet", columns=["col1", "col2"])

# Filter during read (predicate pushdown)
df = pf.read("data.parquet", filters=[("date", ">=", "2023-01-01")])

Further Reading¶