Skip to content

Integration Guide

Integrate ParquetFrame with your existing data stack and workflows.

Integration Patterns

ParquetFrame is designed to integrate seamlessly with popular data tools and frameworks.

Python Ecosystem

Integration with core Python data tools: - Pandas: Native compatibility and conversion - NumPy: Array operations and computations - Scikit-learn: Machine learning pipelines - Matplotlib/Seaborn: Data visualization

Big Data Ecosystem

Work with distributed computing frameworks: - Dask: Distributed processing - Apache Spark: Large-scale data processing - Ray: Distributed machine learning

Database Integration

Connect with databases and data warehouses: - SQL query support - Database connectors - Data pipeline integration

Cloud Platforms

Deploy on cloud platforms: - AWS S3 and Lambda - Google Cloud Storage and Functions - Azure Blob Storage and Functions

Summary

ParquetFrame's flexible architecture enables integration across the entire data ecosystem.

Examples

import parquetframe as pf
import pandas as pd
from sklearn.model_selection import train_test_split

# Integration with pandas
df = pf.read("data.parquet")
pandas_df = df.to_pandas()

# Machine learning pipeline
X = df[['feature1', 'feature2', 'feature3']]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Database integration
df = pf.read("data.parquet")
df.to_sql("processed_data", connection_string)

# Cloud storage
df = pf.read("s3://bucket/data.parquet")
df.save("s3://bucket/processed/output.parquet")

Further Reading