- Feature Name:
- Start Date: 2020-01-21
- Tremor Issue: wayfair-tremor/tremor-runtime#0015
- RFC PR: wayfair-tremor/tremor-rfcs#0008
Pull data from Postgres tables.
We see use cases a function of features supported by PostgreSQL out of the box, as well as additional features introduced as extensions or otherwise packaged as completely separate products. Primary examples that we are looking at are TimescaleDB and PipelineDB.
The ultimate goal is to be able to reliably pipe in data from PostgreSQL core to Tremor engine.
PostgreSQL is a standard Tremor onramp. It is configured by passing in
standard PostgreSQL connection string arguments, such as
In addition to connection string arguments, we require:
interval in milliseconds that specifies the time we wait before executing the next query
consume_from specifies the timestamp from which to backfill data
query in form of standard
SELECT query which retrieves rows from a table
Unlike other onramps implemented so far, PostgreSQL onramp will persist
consume_until, allowing recovery from error conditions or other failures.
id: db type: postgres codec: json config: host: localhost port: 5432 user: postgres password: example query: "SELECT id, name from events WHERE produced_at <= $1 AND produced_at > $2" interval_ms: 1000 dbname: sales cache: path: "/path/to/cache.json" size: 4096
Cache is a memory mapped region, either file-based or anonymous.
The onramp does not provide an implementation of all types Postgres supports and the entirety of query language (for example, LISTEN/NOTIFY semantics).
Rationale and alternatives¶
As a starting point, or first implementation, supporting basic
statements with a time interval as additional
WHERE clause supports the major use
case: ingesting rows from a table or view.
Alternative approach would utilize PostgreSQL single row mode suitable for ingestion of results returned by queries that span a large number of rows. Ensuring at-most-once ingestion of rows would be a potential problem and a time-consuming/API-breaking implementation.
Other approaches, such as trigger-based watches or binary log readers are also a possibility.
For users that utilize PostgreSQL as an event store, which seems to be a more
common use case lately, support for
LISTEN/NOTIFY would be essential to
become one of modes of this onramp.
Support multiple queries in the same onramp.
More flexible means of specifying parameters.