Integration Guide¶

Integrate ParquetFrame with your existing data stack and workflows.

Integration Patterns¶

ParquetFrame is designed to integrate seamlessly with popular data tools and frameworks.

Python Ecosystem¶

Integration with core Python data tools: - Pandas: Native compatibility and conversion - NumPy: Array operations and computations - Scikit-learn: Machine learning pipelines - Matplotlib/Seaborn: Data visualization

Big Data Ecosystem¶

Work with distributed computing frameworks: - Dask: Distributed processing - Apache Spark: Large-scale data processing - Ray: Distributed machine learning

Database Integration¶

Connect with databases and data warehouses: - SQL query support - Database connectors - Data pipeline integration

Cloud Platforms¶

Deploy on cloud platforms: - AWS S3 and Lambda - Google Cloud Storage and Functions - Azure Blob Storage and Functions

Summary¶

ParquetFrame's flexible architecture enables integration across the entire data ecosystem.

Examples¶

import parquetframe as pf
import pandas as pd
from sklearn.model_selection import train_test_split

# Integration with pandas
df = pf.read("data.parquet")
pandas_df = df.to_pandas()

# Machine learning pipeline
X = df[['feature1', 'feature2', 'feature3']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Database integration
df = pf.read("data.parquet")
df.to_sql("processed_data", connection_string)

# Cloud storage
df = pf.read("s3://bucket/data.parquet")
df.save("s3://bucket/processed/output.parquet")