Skip to content

Engines

This module provides ClickHouse table engine classes for use in table definitions.

Engine

Base class for ClickHouse table engines.

All engine classes inherit from this base class. Engines are registered automatically in engine_map when subclasses are defined.

Attributes:

Name Type Description
name str

The engine class name.

full_engine str

The full SQL engine expression.

Examples:

>>> from pyclickhouse.engines import MergeTree, build_engine
>>> engine = MergeTree(order_by="id")
>>> print(engine)
MergeTree ORDER BY id
>>> parsed = build_engine("MergeTree ORDER BY id")
>>> print(parsed.full_engine)
MergeTree ORDER BY id

from_sql(full_engine) classmethod

Create an Engine instance from a ClickHouse full_engine SQL expression.

Parameters:

Name Type Description Default
full_engine str

The full engine SQL string (e.g., "MergeTree ORDER BY id").

required

Returns:

Type Description
Engine | None

An Engine instance with name and full_engine set, or None if the

Engine | None

engine type is unknown.

Examples:

>>> engine = Engine.from_sql("MergeTree ORDER BY id")
>>> engine.name
'MergeTree'

to_sql()

Create the full SQL engine expression from the engine name and arguments.

Returns:

Type Description
str

The full SQL engine expression.


Engine Reference

Simple Engines

Engine Description Required Args
Memory Stores data in RAM -
Null Writes but discards data -
Log Basic logging engine -
StripeLog Stripe-logged storage -
TinyLog Lightweight logging -
Set Set storage -
Dictionary Dictionary engine dictionary
Merge Merge family tables db_name, tables_regexp
File File-based storage fmt
Distributed Distributed tables cluster, database, table

MergeTree Family

Engine Description Required Args
MergeTree Main engine for storing and merging data order_by
SummingMergeTree Pre-aggregating data order_by
AggregatingMergeTree Pre-aggregating with aggregate functions order_by
ReplacingMergeTree Deduplication order_by
CollapsingMergeTree Incremental calculation sign, order_by
VersionedCollapsingMergeTree Incremental calculation with versions sign, version, order_by
GraphiteMergeTree Graphite data storage config_section, order_by
CoalescingMergeTree Coalescing identical rows order_by
SharedMergeTree Shared metadata order_by

Replicated Engines

Engine Description Required Args
ReplicatedMergeTree ZooKeeper-based replication order_by
ReplicatedAggregatingMergeTree Replicated pre-aggregation order_by
ReplicatedSummingMergeTree Replicated pre-summarization order_by

Shared Variants

Engine Description Required Args
SharedReplacingMergeTree Shared replacing order_by
SharedAggregatingMergeTree Shared aggregating order_by
SharedSummingMergeTree Shared summing order_by
SharedVersionedCollapsingMergeTree Shared versioned collapsing order_by
SharedGraphiteMergeTree Shared graphite order_by

Integration Engines

Engine Description Required Args
Kafka Kafka integration broker_list, topic_list, group_name, format
PostgreSQL PostgreSQL integration host_port, database, table, user, password

Simple Engines

Memory(*, settings=None) dataclass

Bases: Engine

Memory engine that stores data in RAM.

Data is stored in RAM and is not persisted. Useful for testing or temporary data that doesn't need persistence.

Parameters:

Name Type Description Default
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = Memory()
>>> print(engine)
Memory

Null() dataclass

Bases: Engine

Null engine that writes data but immediately discards it.

All data is written but never stored. Useful for testing write performance or discarding unwanted data.

Examples:

>>> engine = Null()
>>> print(engine)
Null

Log(*, settings=None) dataclass

Bases: Engine

Log engine with basic table logging functionality.

Stores data in a log file with basic support for concurrent access. Suitable for small to medium tables with frequent inserts.

Parameters:

Name Type Description Default
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = Log()
>>> print(engine)
Log

StripeLog(*, settings=None) dataclass

Bases: Engine

StripeLog engine optimized for sequential writes.

Stores data in stripes, optimized for bulk insert operations. Better write performance than Log for sequential writes.

Parameters:

Name Type Description Default
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = StripeLog()
>>> print(engine)
StripeLog

TinyLog(*, settings=None) dataclass

Bases: Engine

TinyLog engine with minimal storage overhead.

Lightweight logging engine with minimal metadata storage. Good for tables with many small columns.

Parameters:

Name Type Description Default
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = TinyLog()
>>> print(engine)
TinyLog

Set(*, settings=None) dataclass

Bases: Engine

Set engine for storing unique values.

Stores a set of unique values that can be used for IN queries and set operations.

Parameters:

Name Type Description Default
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = Set()
>>> print(engine)
Set

Dictionary(*, dictionary) dataclass

Bases: Engine

Dictionary engine for accessing external dictionaries.

Provides access to data from external dictionaries configured in ClickHouse.

Parameters:

Name Type Description Default
dictionary str

The name of the dictionary to use.

required

Examples:

>>> engine = Dictionary(dictionary="my_dict")
>>> print(engine)
Dictionary('my_dict')

Merge(*, db_name, tables_regexp) dataclass

Bases: Engine

Merge engine for reading from multiple tables.

Allows reading from multiple tables matching a regular expression. The actual tables must be on the same server.

Parameters:

Name Type Description Default
db_name str

The database name or a regex pattern.

required
tables_regexp str

Regular expression to match table names.

required

Examples:

>>> engine = Merge(db_name="mydb", tables_regexp=".*")
>>> print(engine)
Merge(mydb, '.*')

File(*, fmt) dataclass

Bases: Engine

File engine for reading/writing files.

Allows reading and writing files in various formats directly.

Parameters:

Name Type Description Default
fmt str

The file format (e.g., 'CSV', 'JSON', 'Parquet').

required

Examples:

>>> engine = File(fmt="CSV")
>>> print(engine)
File('CSV')

Distributed(*, cluster, database, table, sharding_key=None, policy_name=None, settings=None) dataclass

Bases: Engine

Distributed engine for querying remote servers.

Provides a way to query data across multiple remote servers. The table itself doesn't store data, just coordinates queries.

Parameters:

Name Type Description Default
cluster str

The cluster name in the ClickHouse configuration.

required
database str

The database name on remote servers.

required
table str

The table name on remote servers.

required
sharding_key str | None

Optional sharding key for write operations.

None
policy_name str | None

Optional storage policy name.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = Distributed(cluster="mycluster", database="mydb", table="mytable")
>>> print(engine)
Distributed(mycluster, mydb, mytable)

MergeTree Family

MergeTree(*, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, ttl=None, settings=None) dataclass

Bases: Engine

MergeTree main engine for storing and merging data.

The most commonly used engine for large tables. Supports: - Custom sorting keys via ORDER BY - Partitioning via PARTITION BY - Data sampling via SAMPLE BY - TTL for automatic data expiration

Parameters:

Name Type Description Default
order_by str

Sorting key expression. Defaults to 'tuple()'.

'tuple()'
primary_key str | None

Primary key expression (can differ from order_by).

None
partition_by str | None

Partition key expression for organizing data.

None
sample_by str | None

Sampling expression for approximate queries.

None
ttl str | None

TTL expression for automatic data expiration.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = MergeTree(order_by="id", partition_by="toYYYYMM(date)")
>>> print(engine)
MergeTree ORDER BY id PARTITION BY toYYYYMM(date)

SummingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

SummingMergeTree engine for pre-aggregating data.

Automatically sums values with the same sorting key. Useful for maintaining aggregated data incrementally.

Parameters:

Name Type Description Default
columns str | None

Columns to sum (optional).

None
order_by str | None

Sorting key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = SummingMergeTree(columns="amount", order_by="id")
>>> print(engine)
SummingMergeTree(amount) ORDER BY id

AggregatingMergeTree(*, order_by=None, partition_by=None, sample_by=None, ttl=None, settings=None) dataclass

Bases: Engine

AggregatingMergeTree engine for pre-aggregating with aggregate functions.

Stores data using AggregateFunction columns. Supports custom aggregation functions for incremental computation.

Parameters:

Name Type Description Default
order_by str | None

Sorting key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
ttl str | None

TTL expression for automatic data expiration.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = AggregatingMergeTree(order_by="id")
>>> print(engine)
AggregatingMergeTree ORDER BY id

ReplacingMergeTree(*, ver=None, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

ReplacingMergeTree engine for deduplication.

Removes duplicate entries with the same sorting key, keeping only the latest or a specific version.

Parameters:

Name Type Description Default
ver str | None

Version column to select which row to keep.

None
order_by str | None

Sorting key expression.

None
primary_key str | None

Primary key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = ReplacingMergeTree(ver="version", order_by="id")
>>> print(engine)
ReplacingMergeTree(version) ORDER BY id

CollapsingMergeTree(*, sign, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

CollapsingMergeTree engine for incremental calculation.

Deduplicates data based on a sign column. Rows with opposite signs cancel each other out.

Parameters:

Name Type Description Default
sign str

Sign column name (typically 'sign' with values 1 and -1).

required
order_by str | None

Sorting key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = CollapsingMergeTree(sign="sign", order_by="id")
>>> print(engine)
CollapsingMergeTree(sign) ORDER BY id

VersionedCollapsingMergeTree(*, sign, version, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

VersionedCollapsingMergeTree for incremental calculation with versions.

Similar to CollapsingMergeTree but uses a version column for better handling of concurrent writes.

Parameters:

Name Type Description Default
sign str

Sign column name.

required
version str

Version column name.

required
order_by str | None

Sorting key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = VersionedCollapsingMergeTree(sign="sign", version="ver", order_by="id")
>>> print(engine)
VersionedCollapsingMergeTree(sign, ver) ORDER BY id

GraphiteMergeTree(*, config_section, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

GraphiteMergeTree engine for storing Graphite data.

Optimized for storing and querying Graphite monitoring data with automatic data rollup support.

Parameters:

Name Type Description Default
config_section str

Configuration section name in graphite.config.

required
order_by str | None

Sorting key expression.

None
primary_key str | None

Primary key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = GraphiteMergeTree(config_section="graphite_config", order_by="path")
>>> print(engine)
GraphiteMergeTree(graphite_config) ORDER BY path

CoalescingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

CoalescingMergeTree engine for coalescing identical rows.

Merges rows that have identical values in specified columns.

Parameters:

Name Type Description Default
columns str | None

Columns to check for identical values.

None
order_by str | None

Sorting key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = CoalescingMergeTree(columns="col1", order_by="id")
>>> print(engine)
CoalescingMergeTree(col1) ORDER BY id

SharedMergeTree(*, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, ttl=None, settings=None) dataclass

Bases: MergeTree

SharedMergeTree engine with shared metadata.

Similar to MergeTree but shares metadata across replicas.

Examples:

>>> engine = SharedMergeTree(order_by="id")
>>> print(engine)
SharedMergeTree ORDER BY id

Replicated Engines

ReplicatedMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: Engine

ReplicatedMergeTree engine with ZooKeeper-based replication.

Provides data replication across multiple servers using ZooKeeper for coordination. Supports automatic recovery and replica failover.

Parameters:

Name Type Description Default
zk_path str | None

ZooKeeper path for replica coordination.

None
replica str | None

Replica name in ZooKeeper.

None
order_by str

Sorting key expression. Defaults to 'tuple()'.

'tuple()'
primary_key str | None

Primary key expression.

None
partition_by str | None

Partition key expression.

None
sample_by str | None

Sampling expression.

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = ReplicatedMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1")
>>> print(engine)
ReplicatedMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY tuple()

ReplicatedAggregatingMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: ReplicatedMergeTree

ReplicatedAggregatingMergeTree for replicated pre-aggregation.

Combines replication with aggregating merge tree functionality.

Examples:

>>> engine = ReplicatedAggregatingMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1", order_by="id")
>>> print(engine)
ReplicatedAggregatingMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY id

ReplicatedSummingMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: ReplicatedMergeTree

ReplicatedSummingMergeTree for replicated pre-summarization.

Combines replication with summing merge tree functionality.

Examples:

>>> engine = ReplicatedSummingMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1", order_by="id")
>>> print(engine)
ReplicatedSummingMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY id

Shared Variants

SharedReplacingMergeTree(*, ver=None, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: ReplacingMergeTree

SharedReplacingMergeTree with shared metadata.

Combines shared metadata with replacing merge tree functionality.

Examples:

>>> engine = SharedReplacingMergeTree(order_by="id")
>>> print(engine)
SharedReplacingMergeTree ORDER BY id

SharedAggregatingMergeTree(*, order_by=None, partition_by=None, sample_by=None, ttl=None, settings=None) dataclass

Bases: AggregatingMergeTree

SharedAggregatingMergeTree with shared metadata.

Combines shared metadata with aggregating merge tree functionality.

Examples:

>>> engine = SharedAggregatingMergeTree(order_by="id")
>>> print(engine)
SharedAggregatingMergeTree ORDER BY id

SharedSummingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: SummingMergeTree

SharedSummingMergeTree with shared metadata.

Combines shared metadata with summing merge tree functionality.

Examples:

>>> engine = SharedSummingMergeTree(columns="amount", order_by="id")
>>> print(engine)
SharedSummingMergeTree(amount) ORDER BY id

SharedVersionedCollapsingMergeTree(*, sign, version, order_by=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: VersionedCollapsingMergeTree

SharedVersionedCollapsingMergeTree with shared metadata.

Combines shared metadata with versioned collapsing merge tree functionality.

Examples:

>>> engine = SharedVersionedCollapsingMergeTree(sign="sign", version="ver", order_by="id")
>>> print(engine)
SharedVersionedCollapsingMergeTree(sign, ver) ORDER BY id

SharedGraphiteMergeTree(*, config_section, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None) dataclass

Bases: GraphiteMergeTree

SharedGraphiteMergeTree with shared metadata.

Combines shared metadata with graphite merge tree functionality.

Examples:

>>> engine = SharedGraphiteMergeTree(config_section="graphite_config", order_by="path")
>>> print(engine)
SharedGraphiteMergeTree(graphite_config) ORDER BY path

Integration Engines

Kafka(*, broker_list, topic_list, group_name, format, settings=None) dataclass

Bases: Engine

Kafka engine for reading from Kafka topics.

Allows ClickHouse to read data from Kafka streams. Can be used for real-time data ingestion from Kafka.

Parameters:

Name Type Description Default
broker_list str

Kafka broker addresses (comma-separated if multiple).

required
topic_list str

Kafka topic or topics to read from.

required
group_name str

Consumer group name.

required
format str

Message format (e.g., 'JSONEachRow', 'CSV', 'Avro').

required
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = Kafka(broker_list="localhost:9092", topic_list="my_topic", group_name="my_group", format="JSONEachRow")
>>> print(engine)
Kafka('localhost:9092', 'my_topic', 'my_group', 'JSONEachRow')

PostgreSQL(*, host_port, database, table, user, password, schema=None, settings=None) dataclass

Bases: Engine

PostgreSQL engine for reading from PostgreSQL tables.

Allows ClickHouse to query data from PostgreSQL tables directly. Useful for data federation and migration scenarios.

Parameters:

Name Type Description Default
host_port str

PostgreSQL host and port (e.g., 'localhost:5432').

required
database str

PostgreSQL database name.

required
table str

PostgreSQL table name.

required
user str

PostgreSQL user name.

required
password str

PostgreSQL user password.

required
schema str | None

PostgreSQL schema name (optional, defaults to 'public').

None
settings dict[str, Any] | None

Optional engine settings as a dictionary.

None

Examples:

>>> engine = PostgreSQL(host_port="localhost:5432", database="mydb", table="users", user="admin", password="secret")
>>> print(engine)
PostgreSQL('localhost:5432', 'mydb', 'users', 'admin', 'secret')