Engines
This module provides ClickHouse table engine classes for use in table definitions.
Engine
Base class for ClickHouse table engines.
All engine classes inherit from this base class. Engines are registered
automatically in engine_map when subclasses are defined.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The engine class name. |
full_engine |
str
|
The full SQL engine expression. |
Examples:
>>> from pyclickhouse.engines import MergeTree, build_engine
>>> engine = MergeTree(order_by="id")
>>> print(engine)
MergeTree ORDER BY id
>>> parsed = build_engine("MergeTree ORDER BY id")
>>> print(parsed.full_engine)
MergeTree ORDER BY id
from_sql(full_engine)
classmethod
Create an Engine instance from a ClickHouse full_engine SQL expression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full_engine
|
str
|
The full engine SQL string (e.g., "MergeTree ORDER BY id"). |
required |
Returns:
| Type | Description |
|---|---|
Engine | None
|
An Engine instance with name and full_engine set, or None if the |
Engine | None
|
engine type is unknown. |
Examples:
to_sql()
Create the full SQL engine expression from the engine name and arguments.
Returns:
| Type | Description |
|---|---|
str
|
The full SQL engine expression. |
Engine Reference
Simple Engines
| Engine | Description | Required Args |
|---|---|---|
| Memory | Stores data in RAM | - |
| Null | Writes but discards data | - |
| Log | Basic logging engine | - |
| StripeLog | Stripe-logged storage | - |
| TinyLog | Lightweight logging | - |
| Set | Set storage | - |
| Dictionary | Dictionary engine | dictionary |
| Merge | Merge family tables | db_name, tables_regexp |
| File | File-based storage | fmt |
| Distributed | Distributed tables | cluster, database, table |
MergeTree Family
| Engine | Description | Required Args |
|---|---|---|
| MergeTree | Main engine for storing and merging data | order_by |
| SummingMergeTree | Pre-aggregating data | order_by |
| AggregatingMergeTree | Pre-aggregating with aggregate functions | order_by |
| ReplacingMergeTree | Deduplication | order_by |
| CollapsingMergeTree | Incremental calculation | sign, order_by |
| VersionedCollapsingMergeTree | Incremental calculation with versions | sign, version, order_by |
| GraphiteMergeTree | Graphite data storage | config_section, order_by |
| CoalescingMergeTree | Coalescing identical rows | order_by |
| SharedMergeTree | Shared metadata | order_by |
Replicated Engines
| Engine | Description | Required Args |
|---|---|---|
| ReplicatedMergeTree | ZooKeeper-based replication | order_by |
| ReplicatedAggregatingMergeTree | Replicated pre-aggregation | order_by |
| ReplicatedSummingMergeTree | Replicated pre-summarization | order_by |
Shared Variants
| Engine | Description | Required Args |
|---|---|---|
| SharedReplacingMergeTree | Shared replacing | order_by |
| SharedAggregatingMergeTree | Shared aggregating | order_by |
| SharedSummingMergeTree | Shared summing | order_by |
| SharedVersionedCollapsingMergeTree | Shared versioned collapsing | order_by |
| SharedGraphiteMergeTree | Shared graphite | order_by |
Integration Engines
| Engine | Description | Required Args |
|---|---|---|
| Kafka | Kafka integration | broker_list, topic_list, group_name, format |
| PostgreSQL | PostgreSQL integration | host_port, database, table, user, password |
Simple Engines
Memory(*, settings=None)
dataclass
Null()
dataclass
Log(*, settings=None)
dataclass
Bases: Engine
Log engine with basic table logging functionality.
Stores data in a log file with basic support for concurrent access. Suitable for small to medium tables with frequent inserts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
StripeLog(*, settings=None)
dataclass
Bases: Engine
StripeLog engine optimized for sequential writes.
Stores data in stripes, optimized for bulk insert operations. Better write performance than Log for sequential writes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
TinyLog(*, settings=None)
dataclass
Set(*, settings=None)
dataclass
Dictionary(*, dictionary)
dataclass
Bases: Engine
Dictionary engine for accessing external dictionaries.
Provides access to data from external dictionaries configured in ClickHouse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dictionary
|
str
|
The name of the dictionary to use. |
required |
Examples:
Merge(*, db_name, tables_regexp)
dataclass
Bases: Engine
Merge engine for reading from multiple tables.
Allows reading from multiple tables matching a regular expression. The actual tables must be on the same server.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_name
|
str
|
The database name or a regex pattern. |
required |
tables_regexp
|
str
|
Regular expression to match table names. |
required |
Examples:
File(*, fmt)
dataclass
Distributed(*, cluster, database, table, sharding_key=None, policy_name=None, settings=None)
dataclass
Bases: Engine
Distributed engine for querying remote servers.
Provides a way to query data across multiple remote servers. The table itself doesn't store data, just coordinates queries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cluster
|
str
|
The cluster name in the ClickHouse configuration. |
required |
database
|
str
|
The database name on remote servers. |
required |
table
|
str
|
The table name on remote servers. |
required |
sharding_key
|
str | None
|
Optional sharding key for write operations. |
None
|
policy_name
|
str | None
|
Optional storage policy name. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = Distributed(cluster="mycluster", database="mydb", table="mytable")
>>> print(engine)
Distributed(mycluster, mydb, mytable)
MergeTree Family
MergeTree(*, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, ttl=None, settings=None)
dataclass
Bases: Engine
MergeTree main engine for storing and merging data.
The most commonly used engine for large tables. Supports: - Custom sorting keys via ORDER BY - Partitioning via PARTITION BY - Data sampling via SAMPLE BY - TTL for automatic data expiration
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str
|
Sorting key expression. Defaults to 'tuple()'. |
'tuple()'
|
primary_key
|
str | None
|
Primary key expression (can differ from order_by). |
None
|
partition_by
|
str | None
|
Partition key expression for organizing data. |
None
|
sample_by
|
str | None
|
Sampling expression for approximate queries. |
None
|
ttl
|
str | None
|
TTL expression for automatic data expiration. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = MergeTree(order_by="id", partition_by="toYYYYMM(date)")
>>> print(engine)
MergeTree ORDER BY id PARTITION BY toYYYYMM(date)
SummingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
SummingMergeTree engine for pre-aggregating data.
Automatically sums values with the same sorting key. Useful for maintaining aggregated data incrementally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
str | None
|
Columns to sum (optional). |
None
|
order_by
|
str | None
|
Sorting key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = SummingMergeTree(columns="amount", order_by="id")
>>> print(engine)
SummingMergeTree(amount) ORDER BY id
AggregatingMergeTree(*, order_by=None, partition_by=None, sample_by=None, ttl=None, settings=None)
dataclass
Bases: Engine
AggregatingMergeTree engine for pre-aggregating with aggregate functions.
Stores data using AggregateFunction columns. Supports custom aggregation functions for incremental computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order_by
|
str | None
|
Sorting key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
ttl
|
str | None
|
TTL expression for automatic data expiration. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
ReplacingMergeTree(*, ver=None, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
ReplacingMergeTree engine for deduplication.
Removes duplicate entries with the same sorting key, keeping only the latest or a specific version.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ver
|
str | None
|
Version column to select which row to keep. |
None
|
order_by
|
str | None
|
Sorting key expression. |
None
|
primary_key
|
str | None
|
Primary key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = ReplacingMergeTree(ver="version", order_by="id")
>>> print(engine)
ReplacingMergeTree(version) ORDER BY id
CollapsingMergeTree(*, sign, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
CollapsingMergeTree engine for incremental calculation.
Deduplicates data based on a sign column. Rows with opposite signs cancel each other out.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sign
|
str
|
Sign column name (typically 'sign' with values 1 and -1). |
required |
order_by
|
str | None
|
Sorting key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = CollapsingMergeTree(sign="sign", order_by="id")
>>> print(engine)
CollapsingMergeTree(sign) ORDER BY id
VersionedCollapsingMergeTree(*, sign, version, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
VersionedCollapsingMergeTree for incremental calculation with versions.
Similar to CollapsingMergeTree but uses a version column for better handling of concurrent writes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sign
|
str
|
Sign column name. |
required |
version
|
str
|
Version column name. |
required |
order_by
|
str | None
|
Sorting key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = VersionedCollapsingMergeTree(sign="sign", version="ver", order_by="id")
>>> print(engine)
VersionedCollapsingMergeTree(sign, ver) ORDER BY id
GraphiteMergeTree(*, config_section, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
GraphiteMergeTree engine for storing Graphite data.
Optimized for storing and querying Graphite monitoring data with automatic data rollup support.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_section
|
str
|
Configuration section name in graphite.config. |
required |
order_by
|
str | None
|
Sorting key expression. |
None
|
primary_key
|
str | None
|
Primary key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = GraphiteMergeTree(config_section="graphite_config", order_by="path")
>>> print(engine)
GraphiteMergeTree(graphite_config) ORDER BY path
CoalescingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
CoalescingMergeTree engine for coalescing identical rows.
Merges rows that have identical values in specified columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
str | None
|
Columns to check for identical values. |
None
|
order_by
|
str | None
|
Sorting key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = CoalescingMergeTree(columns="col1", order_by="id")
>>> print(engine)
CoalescingMergeTree(col1) ORDER BY id
SharedMergeTree(*, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, ttl=None, settings=None)
dataclass
Bases: MergeTree
SharedMergeTree engine with shared metadata.
Similar to MergeTree but shares metadata across replicas.
Examples:
Replicated Engines
ReplicatedMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: Engine
ReplicatedMergeTree engine with ZooKeeper-based replication.
Provides data replication across multiple servers using ZooKeeper for coordination. Supports automatic recovery and replica failover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
zk_path
|
str | None
|
ZooKeeper path for replica coordination. |
None
|
replica
|
str | None
|
Replica name in ZooKeeper. |
None
|
order_by
|
str
|
Sorting key expression. Defaults to 'tuple()'. |
'tuple()'
|
primary_key
|
str | None
|
Primary key expression. |
None
|
partition_by
|
str | None
|
Partition key expression. |
None
|
sample_by
|
str | None
|
Sampling expression. |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = ReplicatedMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1")
>>> print(engine)
ReplicatedMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY tuple()
ReplicatedAggregatingMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: ReplicatedMergeTree
ReplicatedAggregatingMergeTree for replicated pre-aggregation.
Combines replication with aggregating merge tree functionality.
Examples:
>>> engine = ReplicatedAggregatingMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1", order_by="id")
>>> print(engine)
ReplicatedAggregatingMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY id
ReplicatedSummingMergeTree(*, zk_path=None, replica=None, order_by='tuple()', primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: ReplicatedMergeTree
ReplicatedSummingMergeTree for replicated pre-summarization.
Combines replication with summing merge tree functionality.
Examples:
>>> engine = ReplicatedSummingMergeTree(zk_path="/clickhouse/tables/mytable", replica="replica1", order_by="id")
>>> print(engine)
ReplicatedSummingMergeTree('/clickhouse/tables/mytable', 'replica1') ORDER BY id
Shared Variants
SharedReplacingMergeTree(*, ver=None, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: ReplacingMergeTree
SharedReplacingMergeTree with shared metadata.
Combines shared metadata with replacing merge tree functionality.
Examples:
>>> engine = SharedReplacingMergeTree(order_by="id")
>>> print(engine)
SharedReplacingMergeTree ORDER BY id
SharedAggregatingMergeTree(*, order_by=None, partition_by=None, sample_by=None, ttl=None, settings=None)
dataclass
Bases: AggregatingMergeTree
SharedAggregatingMergeTree with shared metadata.
Combines shared metadata with aggregating merge tree functionality.
Examples:
>>> engine = SharedAggregatingMergeTree(order_by="id")
>>> print(engine)
SharedAggregatingMergeTree ORDER BY id
SharedSummingMergeTree(*, columns=None, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: SummingMergeTree
SharedSummingMergeTree with shared metadata.
Combines shared metadata with summing merge tree functionality.
Examples:
>>> engine = SharedSummingMergeTree(columns="amount", order_by="id")
>>> print(engine)
SharedSummingMergeTree(amount) ORDER BY id
SharedVersionedCollapsingMergeTree(*, sign, version, order_by=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: VersionedCollapsingMergeTree
SharedVersionedCollapsingMergeTree with shared metadata.
Combines shared metadata with versioned collapsing merge tree functionality.
Examples:
>>> engine = SharedVersionedCollapsingMergeTree(sign="sign", version="ver", order_by="id")
>>> print(engine)
SharedVersionedCollapsingMergeTree(sign, ver) ORDER BY id
SharedGraphiteMergeTree(*, config_section, order_by=None, primary_key=None, partition_by=None, sample_by=None, settings=None)
dataclass
Bases: GraphiteMergeTree
SharedGraphiteMergeTree with shared metadata.
Combines shared metadata with graphite merge tree functionality.
Examples:
>>> engine = SharedGraphiteMergeTree(config_section="graphite_config", order_by="path")
>>> print(engine)
SharedGraphiteMergeTree(graphite_config) ORDER BY path
Integration Engines
Kafka(*, broker_list, topic_list, group_name, format, settings=None)
dataclass
Bases: Engine
Kafka engine for reading from Kafka topics.
Allows ClickHouse to read data from Kafka streams. Can be used for real-time data ingestion from Kafka.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
broker_list
|
str
|
Kafka broker addresses (comma-separated if multiple). |
required |
topic_list
|
str
|
Kafka topic or topics to read from. |
required |
group_name
|
str
|
Consumer group name. |
required |
format
|
str
|
Message format (e.g., 'JSONEachRow', 'CSV', 'Avro'). |
required |
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = Kafka(broker_list="localhost:9092", topic_list="my_topic", group_name="my_group", format="JSONEachRow")
>>> print(engine)
Kafka('localhost:9092', 'my_topic', 'my_group', 'JSONEachRow')
PostgreSQL(*, host_port, database, table, user, password, schema=None, settings=None)
dataclass
Bases: Engine
PostgreSQL engine for reading from PostgreSQL tables.
Allows ClickHouse to query data from PostgreSQL tables directly. Useful for data federation and migration scenarios.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host_port
|
str
|
PostgreSQL host and port (e.g., 'localhost:5432'). |
required |
database
|
str
|
PostgreSQL database name. |
required |
table
|
str
|
PostgreSQL table name. |
required |
user
|
str
|
PostgreSQL user name. |
required |
password
|
str
|
PostgreSQL user password. |
required |
schema
|
str | None
|
PostgreSQL schema name (optional, defaults to 'public'). |
None
|
settings
|
dict[str, Any] | None
|
Optional engine settings as a dictionary. |
None
|
Examples:
>>> engine = PostgreSQL(host_port="localhost:5432", database="mydb", table="users", user="admin", password="secret")
>>> print(engine)
PostgreSQL('localhost:5432', 'mydb', 'users', 'admin', 'secret')