diff --git a/apps/docs/content/valkey/overview.mdx b/apps/docs/content/valkey/overview.mdx index 1b1a8616..d0cbae90 100644 --- a/apps/docs/content/valkey/overview.mdx +++ b/apps/docs/content/valkey/overview.mdx @@ -25,39 +25,128 @@ Import configuration version: Zerops offers Valkey in two deployment configurations to meet different availability requirements. -### Non-HA Setup +### Single Setup - Single node deployment on port `6379` (non-TLS) and `6380` (TLS) -- No backup mechanism beyond Zerops infrastructure reliability -- Data persists unless the hardware node fails - Suitable for development or non-critical workloads +See [Persistence](#persistence) for how data is stored and recovered. + ### HA (High Availability) Setup -Our HA implementation uses a unique approach to ensure high availability while maintaining compatibility with all Redis clients: - -- 3-node configuration (1 master + 2 replicas) -- Access ports: - - `6379` - read/write operations (non-TLS, routed to master) - - `6380` - read/write operations over TLS (routed to master) - - `7000` - read-only operations (non-TLS) - - `7001` - read-only operations over TLS -- Implementation details: - - All nodes are configured identically and listen on standard ports - - First node in the cluster is designated as the master - - On replica nodes, ports `6379`/`6380` traffic is forwarded to the master - - Ports `7000`/`7001` are mapped locally to each node for direct replica access - - When a master fails, a replica is promoted and routing is updated automatically - - DNS entries are updated for seamless client connection - - This implementation provides traffic forwarding to master (not natively supported by Valkey) +The HA deployment is a 3-node cluster with automatic failover, fronted by an HAProxy load balancer on every node. + +- 3-node configuration: 1 primary + 2 replicas +- Client-facing ports (available on every node): + - `6379` — read/write (non-TLS), routed to the current primary + - `6380` — read/write over TLS, routed to the current primary + - `7000` — read-only (non-TLS), load-balanced across replicas + - `7001` — read-only over TLS, load-balanced across replicas +- Failover is handled by a built-in [Sentinel](https://valkey.io/topics/sentinel/) cluster. When the primary becomes unreachable, a replica is promoted automatically and HAProxy starts routing writes to it. +- TLS is terminated at HAProxy. +- Connect your application to the standard ports — the address never changes when the primary moves. :::note -Be aware that replica data may lag slightly behind the master due to asynchronous replication. +Replica reads (ports `7000`/`7001`) can lag slightly behind the primary due to asynchronous replication. ::: +**Failover client impact:** expect roughly 10–15 seconds of write unavailability while a new primary is elected and HAProxy reconverges. Read traffic on surviving replicas is unaffected. + :::tip Trusting the TLS certificate -The certificates served on the TLS ports (`6380` and `7001`) are signed by the Zerops Certificate Authority. To verify them from outside Zerops, download and trust the [Zerops CA](/references/networking/zerops-ca) — e.g. `redis-cli --tls --cacert ./zerops-ca.pem -h -p 6380`. +The certificates served on the TLS ports (`6380` and `7001`) are signed by the Zerops Certificate Authority. To verify them from outside Zerops, download and trust the [Zerops CA](/references/networking/zerops-ca) — e.g. `redis-cli --tls --cacert ./zerops-ca.pem -h -p 6380 -a `. +::: + +## Connecting + +Zerops generates the connection details as environment variables on the Valkey service. Reference them from another service in the same project as `${_}` — for a service named `db`, the connection string is `${db_connectionString}`. The examples below assume the hostname `db`. + +| Variable | Example value | Notes | +|---|---|---| +| `hostname` | `db` | Service hostname; reachable as `db.zerops` inside the project | +| `port` | `6379` | Plain (non-TLS) port | +| `portTls` | `6380` | TLS port | +| `password` | *(generated)* | Password for the `default` user (sensitive) | +| `connectionString` | `redis://default:@db.zerops:6379` | Ready-to-use non-TLS URL | +| `connectionTlsString` | `rediss://default:@db.zerops:6380` | Ready-to-use TLS URL | + +In **HA mode** four additional variables expose the read-only replica endpoints (load-balanced across replicas): + +| Variable | Example value | Notes | +|---|---|---| +| `portReplicas` | `7000` | Read-only plain port | +| `portTlsReplicas` | `7001` | Read-only TLS port | +| `connectionStringReplicas` | `redis://default:@db.zerops:7000` | Read-only non-TLS URL | +| `connectionTlsStringReplicas` | `rediss://default:@db.zerops:7001` | Read-only TLS URL | + +The connection string format is `redis://default:@.zerops:` (or `rediss://` for TLS). The username is always `default`. + +:::note Authentication +Valkey requires a password. It is generated automatically, exposed as the sensitive `${db_password}` variable, and already embedded in the `connectionString` variables above. Connect with it directly — e.g. `redis-cli -h db.zerops -p 6379 -a "$db_password"`. + +Services created **without** a `password` variable (older deployments) keep working without authentication and are unaffected. **All deployments created since this release require the password.** +::: + +Idle connections are closed after 5 minutes of inactivity. Use a client connection pool or TCP keep-alive if your application holds long-lived idle connections. + +## Persistence + +Valkey persists data to disk with **AOF (append-only file)**, so the dataset survives restarts and is rebuilt automatically on startup. + +- **AOF is enabled** (`appendonly yes`) and synced to disk **every second** (`appendfsync everysec`). After an unclean crash you lose at most ~1 second of the most recent writes. +- **RDB snapshots are disabled** (`save ""`) — durability relies on AOF, not periodic snapshots. + +**Durability by mode:** +- **Single:** the AOF lives on the node's local disk. Data survives service restarts but is lost if the underlying hardware node fails and no backup exists. +- **HA:** writes are additionally replicated to two replicas, so the dataset survives the loss of any single node via automatic failover. + +:::note Backups +Platform-managed encrypted backups are available for both Single and HA setups. They are **disabled by default** — enable them on the service if you need point-in-time recovery beyond AOF and replication. ::: +## Memory and Autoscaling + +You don't set `maxmemory` directly. Zerops sizes it at **80% of the container's available RAM** — precisely 80% of the *smaller* of your configured maximum RAM and the cgroup-allocated RAM. It is re-evaluated and adjusted automatically about every 30 seconds, so `maxmemory` tracks the container as it scales vertically (subject to the `noeviction` caveat below). The remaining 20% covers Valkey's internal overhead (fork on AOF rewrite / replica sync, fragmentation) and the OS. + +:::warning Keep minimum free RAM above 20% when customizing autoscaling +If you edit the autoscaling configuration, keep the **minimum free RAM above 20%**. Zerops caps `maxmemory` at 80% of available RAM, so the dataset alone can never push free RAM below 20%. If your minimum free RAM threshold is at or below 20%, the scale-up trigger may **never fire at all** — free RAM never crosses it, so the service stays stuck at its current size and starts evicting keys (or rejecting writes under `noeviction`) instead of scaling up. Setting the threshold above 20% lets the dataset's growth toward the 80% cap cross the trigger, so the service scales up in time and keeps headroom for the fork during an AOF rewrite or replica sync. The built-in profiles all keep this threshold above 20%. +::: + +:::note Check the logs for OOM events +Watch the service's runtime logs for out-of-memory events — typically the kernel OOM-killer terminating and restarting Valkey when a fork during an AOF rewrite or replica sync briefly inflates memory. Recurring OOMs mean the reserved headroom isn't enough for your workload's peaks. Raise the **minimum free RAM** (more headroom) or the **minimum RAM** (a higher floor) until they stop. +::: + +## Tunable Parameters + +Two Valkey settings are exposed as **editable** environment variables on the Valkey service. Update them from the service's *Environment variables* page (or via `zcli`/API) and Zerops applies the change live — **no service restart**, no client reconnect. In HA mode the change is rolled out to every node. + +### `VALKEY_MAXMEMORY_POLICY` + +Default: `allkeys-lru`. Controls what Valkey does when the dataset reaches `maxmemory`. + +| Value | Behavior | When to use | +|---|---|---| +| `noeviction` | Reject writes with an OOM error | Datasets where every key must be preserved (session storage without TTL, job queues). Requires careful capacity planning. | +| `allkeys-lru` | Evict least-recently-used keys | General-purpose caching — the safe default | +| `allkeys-lfu` | Evict least-frequently-used keys | Hot/cold workloads where access frequency matters more than recency | +| `allkeys-random` | Evict random keys | Uniform access patterns (rare) | +| `volatile-lru` | Evict LRU keys *with a TTL set* | Mixed workloads: persistent keys without TTL are protected, cache keys with TTL are evictable | +| `volatile-lfu` | Evict LFU keys with a TTL | Same as `volatile-lru`, frequency-based | +| `volatile-random` | Evict random keys with a TTL | Rarely appropriate | +| `volatile-ttl` | Evict keys with the shortest remaining TTL | When TTL reflects priority | + +:::warning `noeviction` and vertical autoscaling +With `noeviction`, automatic vertical **scale-down** of the container is disabled — Valkey cannot free memory through eviction, so a smaller allocation would cause all writes to fail with OOM errors. Scale-up still works normally. Switch to one of the eviction policies above if you want automatic scale-down. +::: + +### `VALKEY_LAZYFREE_LAZY_USER_DEL` + +Default: `yes`. Allowed values: `yes`, `no`. + +When `yes`, client `DEL` commands free memory asynchronously (equivalent to `UNLINK`), keeping Valkey responsive even when deleting very large keys (e.g. a sorted set with millions of members). Set to `no` only if your application specifically relies on synchronous deletes — the overhead of lazy-free is otherwise negligible. + +## Metrics + +Prometheus-compatible metrics are exported by default for scraping, on the port given by the `ZEROPS_PROMETHEUS_PORT` variable (`db:9121`). + ## Learn More - [Official Valkey Documentation](https://valkey.io/docs) - Comprehensive guide to Valkey features diff --git a/apps/docs/src/css/_docusaurus.css b/apps/docs/src/css/_docusaurus.css index e9aabc61..6a1358cd 100644 --- a/apps/docs/src/css/_docusaurus.css +++ b/apps/docs/src/css/_docusaurus.css @@ -318,12 +318,32 @@ th { } td { - width: 100%; padding: 8px; border: 1px solid #b6b9bd; @apply bg-[#F2F5F7] dark:bg-[#1B1B1F]; } +/* Infima's default is `table { display: block; overflow: auto }`, which turns + the into a block scroll-container whose actual grid becomes an + anonymous, shrink-to-fit table box. `width: 100%` then only sizes the outer + block — it never reaches the grid — so the table refuses to fill the page, + and forcing a cell width (the old `td { width: 100% }`) just bloats one + column instead of widening the table. On tablet/desktop, restore a real + table box so `width: 100%` + auto layout distribute columns and fill the + parent. Below 768px we keep Infima's display:block + overflow-x scroll so + wide tables stay swipeable on mobile. */ +@media (min-width: 768px) { + table { + display: table; + table-layout: auto; + } + + th, + td { + overflow-wrap: anywhere; + } +} + @keyframes shimmer { 0% { transform: translateX(-100%);