ClickHouse: wrong-result oracle suite (31 oracles), recursive type system, and generator expansion#4
Open
fm4v wants to merge 185 commits into
Open
Conversation
The ClickHouse provider previously supported only the five TLP variants (Where / Distinct / GroupBy / Aggregate / Having) and NoREC. This change adds three more general-purpose oracles to ClickHouseOracleFactory. PQS (Pivoted Query Synthesis, Rigger & Su, OSDI 2020). The classical SQLancer PQS implementation requires every AST node to expose a Java-side getExpectedValue() that mirrors the DBMS' evaluation semantics. ClickHouse's expression AST does not provide this for most of the generated tree, and reproducing ClickHouse's coercion / NULL / arithmetic rules in Java would be an open-ended effort. This implementation delegates rectification to the server: for each randomly-generated predicate the pivot row's values are embedded as literals in a one-row subquery and ClickHouse itself evaluates the predicate. Based on TRUE / FALSE / NULL the predicate is kept, negated, or wrapped in IS NULL so the conjunction is guaranteed to hold for the pivot row. Containment is checked with INTERSECT, which handles NULL semantics correctly in ClickHouse. CERT (Cardinality Estimation Restriction Testing). Builds a random SELECT, mutates it through a WHERE / AND / OR / DISTINCT toggle with a known monotonicity direction, then asserts the actual row count moves the way the mutation predicts. A plan-similarity gate on EXPLAIN PLAN output skips cases where the plan diverges enough that the comparison stops being meaningful. ClickHouse doesn't surface single-number cardinality estimates that the JDBC client can read, so this uses actual row counts -- still catches optimizer-driven row loss such as predicate-pushdown bugs and faulty DISTINCT dedup. JOIN / GROUPBY / HAVING / LIMIT mutators are skipped: LIMIT isn't serialized by the visitor, the others need richer query shapes than the existing generator produces. CODDTest (Cross-Optimization Decision Differential Testing). Runs the same query twice with a random subset of optimizer flags toggled on vs off (injected as a per-query SETTINGS clause to avoid leaking state into neighbouring oracle runs sharing the same connection) and asserts the two result sets are identical. The flag list is deliberately conservative -- rewrites with high blast radius (analyzer enable/disable, JOIN algorithm) are excluded because they tend to surface stylistic differences rather than correctness bugs. ClickHouseSchema.ClickHouseRowValue's constructor is promoted from package-private to public so the PQS oracle in oracle/pqs/ can construct it for the base class diagnostic logging. Smoke-tested against a release ClickHouse 26.5 server, single thread, 30-second budget per oracle: PQS ~10 q/s with 94% successful statements; CERT ~12 q/s with 98%; CODDTest ~16 q/s with 97%. No false positives in any run. Checkstyle clean (`mvn checkstyle:check`), naming convention check passes (`python src/check_names.py`).
fm4v
added a commit
to ClickHouse/ClickHouse
that referenced
this pull request
May 15, 2026
Removes the ci/docker/sqlancer-test/overlay/ Java sources, the Dockerfile COPY step that overlays them onto the cloned fork, and the PQS/CERT/CODDTest entries from the TESTS array. The three new oracles are being added to the ClickHouse fork of SQLancer directly (ClickHouse/sqlancer#4). Once that lands, this PR will bump the pinned fork commit and re-add the names to TESTS.
The initial CERT and CODDTest implementations diverged from their papers in ways that defeated the test signal: CERT was using actual row counts from running the queries and a bidirectional mutator framework. Per Ba and Rigger, ICSE 2024 the property under test is `EstCard(Q', D) <= EstCard(Q, D)` -- the *estimator's* projection, with Q' strictly more restrictive than Q, and "CERT eschews executing queries". This rewrite: * Reads cardinality from `EXPLAIN ESTIMATE`, summing `rows` across the per-table tuples it returns. The query is never executed. * Restricts mutations to one direction. `mutateWhere`/`mutateAnd` always AND-tighten or introduce a WHERE; `mutateOr` drops an OR operand (per the paper's restrictive-OR rule) or falls back to AND; `mutateDistinct` only promotes ALL -> DISTINCT. All return `increase=false`. * Skips runs where the estimator returns nothing (non-MergeTree engines, `ORDER BY tuple()`, unsupported expressions), and skips runs where the structural-similarity gate on `EXPLAIN PLAN` shows too much drift. CODDTest was toggling random optimizer flags via per-query `SETTINGS` clauses and comparing results. Per Zhang and Rigger, SIGMOD 2025 the oracle is constant-folding-driven: take a subexpression E in Q, evaluate E to a value V via an auxiliary query A, build a folded query F by substituting V for E, then assert results of Q and F are identical. This rewrite implements the scalar-subquery variant (same as DuckDBCODDTestOracle in the upstream PR sqlancer#1054): aux: SELECT min(c)/max(c) FROM t -> V Q: SELECT * FROM t WHERE col op (SELECT min/max(c) FROM t) F: SELECT * FROM t WHERE col op V Only `Int32`/`String` columns are folded since they are the only types the existing schema generator and `ClickHouseSchema.getConstant` support; NULL auxiliary results are skipped (NULL-propagation would make the predicate UNKNOWN for every row and the equivalence does not hold). Verified locally against a release ClickHouse 26.5 server: * CERT: ~6 q/s effective (most attempts skip because no estimate responds to the random mutation), 0 false positives in a 30s window. * CODDTest: ~22 q/s, 96-97% successful statements, 0 false positives. `mvn checkstyle:check` clean, `mvn package -Dmaven.test.skip=true` succeeds. Papers: CERT https://doi.org/10.1145/3597503.3639076 CODDTest https://doi.org/10.1145/3709674
1 task
…pqs-cert-coddtest
Extends the PQS oracle to cover three paper elements that the initial landing skipped, with no change to the rectification contract or the existing single-table behavior. * Multi-table pivot rows (Section 3.1): 1-3 random non-empty tables. Predicates are generated over the union of their table-qualified columns, the rectification probe stitches one literal-alias subquery per pivot table, and the rectified query uses an implicit cross-join with all rectified predicates in WHERE. * Optional elaborations (Section 3.2) attached probabilistically: DISTINCT, GROUP BY of all pivot columns, ORDER BY. Each preserves the pivot row's presence in the result set by construction. * IS NULL rectification path is now reachable: ClickHouseExpressionGenerator gains an opt-in allowNullLiterals flag (default false to keep existing oracles unchanged) which, when enabled by PQS, occasionally injects a NULL leaf so the probe can legitimately return SQL NULL. Validated against ClickHouse 26.5.1.111: 4 threads x 5 minutes = 21,211 oracle queries; multi-table FROM hits 11,518 (3-table: 2,762), DISTINCT 510, GROUP BY 474, ORDER BY 511, IS NULL rectifications 154, zero AssertionError / reportMissingPivotRow / Exception.
Empirical probe of `EXPLAIN ESTIMATE` shows it only responds to predicates that prune MergeTree primary-key granules; LIMIT, DISTINCT, JOIN-type, and bare GROUP BY are invariant. With the default `index_granularity=8192` and the ~10-30 row inserts the schema generator emits, every table fits in a single granule and the estimate cannot move regardless of restriction, so the oracle was trivially passing every run. This commit fixes that and also covers the only paper rule (HAVING) that moves the estimate. * Bulk-load 50,000 rows from `numbers()` into the chosen table on each check if it sits below that threshold. Idempotent and bounded; tables already amplified are left alone. Verified by inspection: 8 of 9 test tables hit 50k rows on a 60s smoke run; the 9th had MATERIALIZED columns that legitimately rejected `INSERT SELECT`. * Query `system.columns.is_in_primary_key` to discover the table's PK columns and duplicate them 4x in the column list passed to the expression generator. A random leaf is now ~67% likely to be a PK column (1 PK out of 3 cols, otherwise 33%), so a generated predicate much more often hits the granule pruner. * Sometimes (25%) build Q with `GROUP BY <pk_col>` so the new HAVING mutator can fire. The HAVING mutator AND-tightens HAVING with a fresh PK-biased predicate; ClickHouse pushes PK predicates in HAVING down through the optimizer to the scan, making them granule-prune-capable. Falls back to AND-tightening WHERE when no GROUP BY is present so the mutator is never a no-op. * Apply 1-3 random restriction rules per attempt (paper allows multiple). JOIN, bare GROUPBY, and LIMIT remain excluded because they are invariant under ClickHouse's `EXPLAIN ESTIMATE` and would add no bug-finding power. This decision is captured inline.
JDK 26 is the current release (March 2026); CI was pinned to JDK 11 and the pom enforced `source/target=11` via the Eclipse compiler (ecj). The latest ecj on Maven Central (3.45.0) only goes up to JDK 24, so to move forward we switch the maven-compiler-plugin to standard javac, drop the ecj/plexus dependencies and the `org.eclipse.jdt.core.prefs` compiler arguments, and set `<release>26</release>`. The `.settings/` directory remains for IDE use only. * `maven-compiler-plugin` 3.10.1 -> 3.13.0, `<release>26</release>`, no compilerId override, no ecj deps. * `maven-javadoc-plugin` 3.4.1 -> 3.11.2 and `<source>` bumped to 26. * `.github/workflows/main.yml` and `release.yml`: every `java-version: '11'` -> `'26'` (27 occurrences total). Distribution remains Adoptium Temurin, which ships JDK 26 binaries. Verified locally on Temurin 26.0.1: `mvn package -Dmaven.test.skip=true` clean, sqlancer jar runs both TLPWhere and CERT oracles against ClickHouse 26.5 with no JVM-level errors. The few remaining warnings (`System::loadLibrary` from the ClickHouse JDBC LZ4 native, `Unsafe` from Guava) are advisory and unrelated to this change.
Use JDK 25 (the current LTS, released Sept 2025) rather than JDK 26 (non-LTS, released March 2026). Same javac-via-maven-compiler-plugin setup as the previous commit; just flips the source/release level and the CI java-version. * `pom.xml`: maven-compiler-plugin `<release>` 26 -> 25; release profile maven-javadoc-plugin `<source>` 26 -> 25. * `.github/workflows/main.yml` and `release.yml`: every `java-version: '26'` -> `'25'` (27 occurrences). Distribution remains Adoptium Temurin. Verified locally on Temurin 25.0.3: `mvn clean compile test-compile` green; produced jar is class-file major version 69 (Java 25).
The initial implementation covered only the scalar non-correlated subquery case (Section 3.1 case 2 of Zhang & Rigger SIGMOD '25). Extend to follow Algorithm 1 from the paper, picking one mode uniformly per check. Modes: 1. Constant expression (Section 3.1 case 1, was missing). Generates a random column-free expression via the existing `ClickHouseExpressionGenerator`, evaluates it with `SELECT toTypeName(phi), phi`, and substitutes the literal back. The generator's `generateExpressionWithExpression` is seeded with a few typed constant leaves -- this is necessary because `generateExpressionWithColumns` short-circuits to a single constant when called with an empty column list. 2. Scalar non-correlated subquery (Section 3.1 case 2). The previous implementation's `min/max(col)` path, restated in the new framework. 3. Dependent expression (Section 3.2, was missing). Generates a random expression over one outer column k, builds a `SELECT DISTINCT k, phi FROM t` mapping, folds phi to a `CASE WHEN k = v_i THEN r_i ...` wrapped in `cast(..., 'expectedType')` so the folded predicate sees the same operand type as the original through compound predicates. The outer predicate template is also varied (bare comparison, AND/OR compounds, NOT) so phi passes through richer constant-folding paths than the previous fixed `col op phi`. Validated against ClickHouse 26.5.1.111: 4 threads, 5 minutes, 64,071 queries executed, 98% successful statement rate, 0 false positives.
Replace the flat (ClickHouseDataType, String) representation in
ClickHouseLancerDataType with a recursive ClickHouseType ADT (Primitive,
Nullable, LowCardinality, Unknown) plus a four-predicate capability layer.
Re-route every dispatch site that previously AssertionError'd on anything
outside {Int32, String}, add a defensive reflection parser, and extend
ClickHouseCast to cover every v1 primitive kind via a propagating
ClickHouseUnsupportedConstant sentinel.
Activates two new feature flags (--test-nullable-types,
--test-lowcardinality-types, both on by default) so the generator now emits
Nullable and LowCardinality columns. CODDTest's filter and legacy string
parser, CERT's generatorExprFor, and the table generator's PARTITION/SAMPLE/
ORDER clause emission are all rewritten to dispatch via the new capabilities.
Live SQLancer smoke against ClickHouse 26.5 (10 min, 4 oracles, 70k+
queries) surfaced three v1-introduced rejections and they are now handled:
allow_suspicious_low_cardinality_types is set on the JDBC URL when the LC
flag is on; allow_nullable_key=1 is added to MergeTree SETTINGS so wrapped
columns can participate in PARTITION/ORDER/SAMPLE; the
CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN family is added to ClickHouseErrors.
Plan and brainstorm documents that drove the implementation are included
under docs/. CI test enumeration in .github/workflows/main.yml is extended
to run the seven new test classes.
This fork only ships changes to the ClickHouse provider, so the per-DBMS matrix in .github/workflows/main.yml was 19 jobs we never read. Removes citus, cockroachdb, databend, datafusion, duckdb, hive, spark, hsqldb, mariadb, materialize, mysql, oceanbase, postgres, presto, sqlite, tidb, yugabyte, and doris. Keeps `misc` (project-wide style/PMD/Checkstyle/SpotBugs via `mvn verify` plus the misc unit tests and naming convention check) and `clickhouse` (the DBMS job that exercises the type-system foundation tests).
Add two complementary differential-testing capabilities: 1. SEMR oracle (--oracle SEMR) picks one "should-be-result-preserving" ClickHouse optimizer setting from a curated list, runs the same generated SELECT once with the setting forced 0 and once forced 1, and fails when the two multisets diverge. Targets cross-configuration consistency bugs of the shape documented at ClickHouseTLPHavingOracle.java:42 (ClickHouse#12264). 2. --random-session-settings + --random-session-settings-budget apply a random subset of a curated execution-mode catalog via SET k=v on the per-database JDBC connection. Every other oracle (TLP*, NoREC, PQS, CERT, CODDTest) implicitly runs under a different setting profile each database. The two features are mutually exclusive in a single run (rejected at startup with a single clear error). The catalog excludes optimizer-rewrite settings from the randomization list to protect CERT/CODDTest invariants, and excludes settings hardcoded by TLPHaving/TLPAggregate from both lists. Setting churn (unknown setting, out-of-range value) is absorbed via a new expected-error catalog so it never surfaces as an oracle failure. Plan: docs/plans/2026-05-17-001-feat-clickhouse-semr-oracle-settings-randomization-plan.md
The expression generator picked column leaves and operators independently of type, so a String column could feed an arithmetic operator and a Float column could feed gcd/lcm/intDiv. Against ClickHouse 26.2, system.query_log showed ~96% of SQLancer failures were ILLEGAL_TYPE_OF_ARGUMENT (Code 43) from this mismatch, with smaller contributions from NO_COMMON_TYPE join keys (386) and typed-comparison constants (53/32). Four mechanical fixes against the same workload (--oracle TLPDistinct --random-session-settings true, 400 queries, seed 12345): * generateExpressionWithColumns filters to numeric columns and the recursive descent stays in the numeric pool. Falls back to an Int32 constant when the table has only non-numeric columns. * BINARY_FUNCTION splits into integer-only (intDiv/gcd/lcm with plain integer column refs) and any-numeric (max2/min2/pow with the recursive descent). ClickHouse promotes most math wrappers (sin, cos, sqrt, log...) to Float64, so the integer-only branch keeps leaves as bare column refs to stay integer-typed end to end. generateExpressionWithExpression also routes through getRandomAnyNumeric since its pre-built expression leaves are usually aggregate Floats. * generateExpression(type, depth) now defaults rightLeafType to leftLeafType, inverting the previous "force same type with low probability" coin flip that produced Int32-vs-String comparisons. * generateJoinClause enumerates (left, right) column pairs, prefers same-type, falls back to numeric-vs-numeric, and throws IgnoreMeException when no compatible key combination exists. Avoids server roundtrips for joins that would error with NO_COMMON_TYPE. * Off-by-one in four column-picker call sites: getNotCachedInteger(0, size-1) excluded the last index; corrected to size. Result: SELECT failure rate against ClickHouse 26.2.17.31 dropped from 41.6% to 0.09% on the same seeded workload, with the remaining 4 failures being runtime division-by-zero (out of scope for type fixes) and stray edge cases.
…atalog entry Two infrastructure changes that benefit every ClickHouse oracle: - Bump the CI ClickHouse image from 24.3.1.2672 to :head so wrong-result bugs in the active stable line surface earlier. The pin sacrificed reproducibility for stability; we now accept slight CI churn in exchange for catching regressions before they reach a tagged release. - Add "is found in GROUP BY in query" and "(ILLEGAL_AGGREGATION)" to the expected-expression-error catalog. ClickHouse 26's new analyzer raises a different error string than the 24.x branch when a positional GROUP BY reference (GROUP BY 1) resolves to an aggregate SELECT-list column -- the old "Illegal value (aggregate function) for positional argument in GROUP BY" pattern was the 24.x form; both must be absorbed so the generator's harmless aggregate-positional output doesn't surface as an oracle finding in 26+. Surfaced via the EET HAVING-mode regression run but benefits TLPHaving and any future HAVING-using oracle equally.
Add the SIGMOD '25 paper's companion to CODDTest. Where CODDTest folds a sub-expression to its precomputed value and asserts the result is unchanged, EET goes the inverse direction: inject an expression that should fold to a fixed value (tautology, contradiction, or algebraic identity) and assert the rewrite is semantics-preserving. Same target bug class (optimizer constant-folding / short-circuit / partial-eval), orthogonal attack axis. Selectable via --oracle EET. Each check() picks one of four modes uniformly: - WHERE injection. Generate a base predicate `predQ` and random `e`; conjoin `pred AND (3VL-tautology over e)` and assert rows unchanged, or `pred AND (3VL-contradiction over e)` and assert rows empty. The 3VL shapes are `(((e) OR NOT (e)) OR (e) IS NULL)` and `(((e) AND NOT (e)) AND (e) IS NOT NULL)` with binding-tight parens on every reference to `e` -- ClickHouse's parser binds OR looser than NOT and tighter than AND, so an unparenthesized injection inside `pred AND ...` would parse the wrong way. - HAVING injection. Same shapes injected into an aggregated query's HAVING clause. Reuses TLPHaving's `aggregate_functions_null_for_empty=1, enable_optimize_predicate_expression=0` SETTINGS suffix on both sides of the comparison to dodge ClickHouse issue #12264; not applying it produces false positives indistinguishable from EET findings. - Expression-position rewrite. Pick a SELECT-list column `x`, probe its runtime type via `toTypeName`, wrap as `if(taut, x, x)`, `multiIf(taut, x, junk, x)`, or `CASE WHEN taut THEN x ELSE x END` (and the contradiction-negated form). Both arms share `x`'s type; the junk-branch value is `defaultValueOfTypeName(typeOfX)` -- a typed non-NULL default, picked because `cast(NULL, 'LowCardinality(...)')` is rejected at parse time (LowCardinality is not nullable). Each rewrite is wrapped in `cast(..., 'TypeOfX')` to neutralize the type widening some identities introduce. - Algebraic identity. Type-safe substitution from a five-entry catalog (`ClickHouseEETIdentities`): `plus(x,0)`, `multiply(x,1)`, `concat(x,'')`, `coalesce(x,x)`, `if(true,x,x)`. Each entry carries a predicate that gates application to a safe type family. Float and Decimal are excluded from `plus`/`multiply` (NaN / -0.0 formatting and scale-coercion false positives). String only for `concat`. Reuses `CODDTestBase` for failure-attribution fields; the naming mismatch is a deliberate trade-off acknowledged in the plan rather than mechanically duplicating six fields for the second oracle in this family. Validated against ClickHouse 26.5.1.111 with a 27K-query burn-in plus the 1000-query integration test (T18_, --num-threads 1). No oracle assertion failures. Plan in docs/plans/2026-05-18-001-feat-clickhouse-eet-oracle-plan.md. Paper: Zhang and Rigger, "Constant Optimization Driven Database System Testing", SIGMOD '25 (DOI 10.1145/3709674).
Adds max_execution_time=120 to the JDBC URL. Without this cap, occasional heavyweight random queries hit the 300s socket_timeout and produce ambiguous client-side timeout exceptions instead of clean server-side error codes (3 such timeouts observed in a 15-min 2026-05-18 baseline run). The server-side cap surfaces as TIMEOUT_EXCEEDED, absorbed by the matching "Timeout exceeded: elapsed" + "(TIMEOUT_EXCEEDED)" multi-word substrings added to ClickHouseErrors.
Adds the implementation plan for three orthogonal query-generator additions: aggregate combinator chains (-If, -OrNull, -OrDefault, -Distinct, -Array, -State, -Merge, -ForEach, -Resample, -Map), set operations with explicit ALL/DISTINCT keywords (UNION ALL/DISTINCT, INTERSECT, EXCEPT) plus a new ClickHouseTLPSetOpOracle, and ARRAY JOIN structural plumbing (blocked on type-system v2 for activation). Sequenced as commit-level milestones on this branch, with per-phase yield gates measured against a pre-Phase-A baseline. Deepened against five reviewer agents; auto-fixes applied silently, strategic decisions integrated based on user direction (full combinator matrix, single-PR bundling, per-phase yield gates, EXCEPT operator coverage).
Adds compress=false to the JDBC URL. clickhouse-jdbc 0.9.6 has a defect in its LZ4-over-chunked-HTTP decoder (ClickHouseLZ4InputStream + ChunkedInputStream interaction) that fires MalformedChunkCodingException: CRLF expected at end of chunk mid-response, surfaced at the JDBC layer as SQLException: Failed to read value for column. Observed 16 times across the 2026-05-18 15-min baseline (0.33% per-query rate); validated server-side via clickhouse-client (native protocol) which returns valid data for every failing query — confirming the bug is in the driver, not in ClickHouse. With compression off the buggy code path is bypassed entirely: the response stream becomes the raw chunked HTTP body, no LZ4 frame parsing. Trade-off: ~3x larger responses on the wire, but SQLancer's queries are small and the connection is loopback, so net throughput is unaffected. Revisit when clickhouse-jdbc fixes the LZ4 decoder upstream.
The 0.9.8 driver tightened URL-param validation (ClientConfigProperties at 0.9.8 rejects unknown keys with ClientMisconfigurationException, whereas 0.9.6 silently forwarded them as server settings). The pre-existing URL params allow_suspicious_low_cardinality_types, allow_experimental_analyzer, and max_execution_time are ClickHouse server settings and must now be prefixed with `clickhouse_setting_` to pass through. Reconfirmed during the bump: ClickHouseLZ4InputStream.class is byte-identical between 0.9.6 and 0.9.8 (md5 3519c1f7…), so the LZ4-over-chunked-HTTP decoder bug persists. `compress=false` remains load-bearing.
…umbing Add three orthogonal generator-surface expansions and two new TLP-family oracles. Set operations: - ClickHouseSetOperation AST with explicit ALL/DISTINCT keyword variants for UNION / INTERSECT / EXCEPT (six SetOpKind values). Visitor + ToString folded so nested set-ops auto-parenthesise; a top-level set-op renders without outer parens. - ClickHouseTLPSetOpOracle exercises four invariants on the canonical TLP partition (p, NOT p, p IS NULL): UNION ALL multiset equality, UNION DISTINCT set equality, INTERSECT pairwise disjointness, EXCEPT coverage + pairwise disjointness. Renders explicit operator keywords; SETTINGS pinning is belt-and-suspenders. Local guards reject aggregate fetch-columns, non-deterministic predicates, and multi-column shapes that would mask bugs. Startup probe disables the oracle when *_default_mode settings are unknown. Aggregate combinators: - ClickHouseAggregateCombinator + chain field on ClickHouseAggregate. Backward-compatible: empty chain renders plain SUM(x) form; non-empty folds the suffixes into the camelCase function name (sumIf, sumIfArray, etc.) and appends per-suffix extra args inside one paren group. Order-preserving. - Generator emits chains under --test-aggregate-combinators, default off. Weighted suffix picker; per-suffix extra-arg grammar (-If takes one boolean, -Resample takes three integers, all others none). - ClickHouseTLPCombinatorOracle catalog: sumIf, countIf, avgOrNull, sumOrNull, minIf, maxIf. -OrNull family forces aggregate_functions_null_for_empty=0 to avoid double-encoding the empty-NULL semantics. ARRAY JOIN structural plumbing: - arrayJoinExprs + arrayJoinLeft on ClickHouseSelect, emitted between FROM and any regular JOIN clauses. Default empty -- the generator never populates it until type-system v2 introduces Array column generation. Error catalog: - getSetOpErrors / getCombinatorErrors / getArrayJoinErrors with multi-word substring discipline. UNKNOWN_SETTING family deliberately excluded so the set-op startup probe's signal stays visible to future audits. Oracle factory: --oracle=SetOpTLP, --oracle=CombinatorTLP wired through. Options: --test-set-op-tlp, --test-aggregate-combinators, --test-combinator-tlp, --test-array-join (all default off).
Iterated against clickhouse-server:head (26.5.1) and surfaced three issues in my own work plus one candidate ClickHouse behaviour anomaly. SetOpTLP — drop unsound pairwise invariants. The pairwise `Tp INTERSECT Tnp ≡ ∅` and `DISTINCT(Tp) EXCEPT DISTINCT(Tnp) ≡ DISTINCT(Tp)` forms in the original plan are unsound on projections: TLP partitions rows, but a SELECT-list expression can collapse rows from disjoint partitions to identical projected values (constant fetchCol, `c0/c0`, etc.). Replace with: - INTERSECT subset: `DISTINCT(branch) INTERSECT DISTINCT(T) ≡ DISTINCT(branch)`. - EXCEPT coverage only (the chained 4-way form already valid). SetOpTLP — NaN/Infinity guard. SQL equality says `NaN != NaN`, so a single NaN value breaks INTERSECT/EXCEPT-routed comparisons even when row sets are correct. Skip via `IgnoreMeException` when any result row is `NaN`, `inf`, `Infinity`, or their negative forms. Applies to UNION_DISTINCT, INTERSECT, and EXCEPT modes; UNION_ALL multiset equality remains sound. CombinatorTLP — fix countIf identity. `countIf` over empty input always returns 0 (count's identity), but `sum(toUInt64(c))` over empty with `aggregate_functions_null_for_empty=1` returns NULL. The two sides diverge exactly on empty branches (e.g., LEFT ANTI JOIN with no unmatched rows). Pin `countIf`'s identity to `null_for_empty=0` so the sum-based rewrite also returns 0 on empty. Other -If identities (sumIf, minIf, maxIf) stay at =1 because their aggregate's empty-input return (NULL) coincides on both sides. Table generator — restrict engines to MergeTree only. Log/TinyLog/StripeLog and Memory engines diverge from MergeTree on parts, projections, skipping indexes, and mutation semantics; oracle-level false positives swamp any bug-finding signal. Validation also surfaced one candidate ClickHouse anomaly worth filing separately: with `LEFT ANTI JOIN`, the value of `right_side.<join-key>` in the SELECT projection differs between the no-WHERE form (returns 0 / default) and the with-WHERE form (returns the left-side join key value). Reproducible on 26.5.1.761 with both old and new analyzer. Not addressed by this commit.
… buffering Root-cause analysis of the ~10% JDBC-stream failures from the 2026-05-18 SetOpTLP/TLPWhere baselines: the underlying cause is *not* the JDBC driver or the LZ4 path -- it's a fundamental quirk of ClickHouse's HTTP protocol. When a query starts producing rows, the server has already committed to `HTTP/1.1 200 + Transfer-Encoding: chunked`. If execution then errors on some row (e.g., a generated `intDiv(c0, c0)` row hits c0=0 → ILLEGAL_DIVISION mid-stream), the server can't change its mind on the HTTP status -- it writes the plain-text exception into the already-binary response body and closes the connection without the proper terminating chunk. clickhouse-jdbc 0.9.8's `BinaryStreamReader.readDoubleLE` / `readIntLE` then hits EOF and surfaces as `SQLException: Failed to read value for column ...` with `ConnectionClosedException: Premature end of chunk coded message body` underneath. Verified deterministically with a 10M-row table whose mid-row triggers ILLEGAL_DIVISION: hits the exact same exception chain we saw in the wild. Fix: raise `http_response_buffer_size` to 100 MB and pin `wait_end_of_query=1` on the JDBC connection URL. With this, ClickHouse buffers the full response server-side before sending; if execution errors, the server can still emit HTTP 500 + a parseable error body, and the JDBC driver decodes it as a proper `ServerException`. `wait_end_of_query=1` alone is insufficient -- it only takes effect when the response fits the default buffer (a few MB). 100 MB covers every SQLancer-generated result observed so far without giving the server license to allocate gigabytes per query under concurrent load. Verification: 4-thread / 3-min SetOpTLP burn-in went from ~10% JDBC-stream failure rate to zero. Throughput per database appears lower in the new runs because per-DB sessions now complete their full --num-queries budget instead of terminating early on transport failures -- aggregate query throughput is unchanged. The earlier LZ4-decoder hypothesis (kept for the `compress=false` rationale) was a contributing path on 0.9.6 but not the root cause; the LZ4 stack trace was visually similar but came from a separate driver defect.
New oracles registered in ClickHouseOracleFactory:
- QccCache — cross-query cache-poisoning differential (ClickHouse#104781)
- SortedUnionLimitBy — sorted UNION ALL + outer LIMIT BY/DISTINCT (#103231)
- RowPolicy — `USING p` filters identically to `WHERE p` (#97076)
- TableFunctionIN — numbers(N) + IN-vs-OR equivalence (#103835)
- ViewEquivalence — view read == inlined SELECT (#100390)
Generator surface:
- PREWHERE clause now emitted by TLPBase (10%); unlocks #104781 shape.
- FINAL emitted on engines where it is legal — Replacing/Summing/
Aggregating/Collapsing variants. Engine tracked via system.tables.
- Skip indexes (bloom_filter, set, minmax, ngrambf_v1) and projections
(column-subset / aggregating) emitted at CREATE TABLE.
- ReplacingMergeTree and SummingMergeTree added to the engine pool.
- Bare large-integer literals (256/65536/2147483648) folded into
predicates ~10% — targets #101287.
SEMR expansion (Settings-Equivalence-Multiset-Result oracle):
- use_query_condition_cache, use_skip_indexes_on_data_read,
use_index_for_in_with_subqueries, optimize_use_implicit_projections —
cache and skip-index toggles.
- transform_null_in, lazy_columns_replication, compile_expressions,
compile_aggregate_expressions, optimize_aggregators_of_group_by_keys,
optimize_trivial_count_query — bug-history-driven additions covering
#95674, #94339, #103809, #105054, #100794.
Schema:
- ClickHouseTable now records the engine string (read from
system.tables) and exposes supportsFinal() for shape gating.
Verification: ~4000-query multi-oracle burn on ClickHouse 26.5.1.111
(TLPDistinct, TLPGroupBy, TLPHaving, SEMR, QccCache, SortedUnionLimitBy,
RowPolicy, TableFunctionIN, ViewEquivalence) — zero regressions.
v1 emitted only Primitive(Int32|String) optionally wrapped in Nullable/LowCardinality. v2 extends the ADT with four new constructors (FixedString(N), Decimal(P,S), DateTime64(prec), Array(T)) plus a new DateTime Kind, and widens the generator's pick set to every entry of Kind with weighted distribution biased toward bug-bait surfaces (UInt32/UInt64 for mixed-width JOIN keys, Date/DateTime for partition expressions, FixedString/Decimal/DateTime64 at low rate). ReplacingMergeTree(ver) and SummingMergeTree(col) engine args now pick a viable column from the new type pool; previously both were emitted with empty args because the only numeric the v1 picker chose (Int32) was signed. ARRAY JOIN, plumbed earlier behind --test-array-join, now fires when the FROM table carries an Array column. Refactor: pre-build the dummy column list with its final types and hand that to ColumnBuilder so the in-memory column list matches what the server sees -- the previous flow generated types twice and engine args picked from a list that didn't match the emitted DDL. Cast rendering uses the compound type (FixedString(5), Decimal(9,3)) not the bare JDBC enum tag, otherwise the cast text drops the parameter slots and CH rejects with NUMBER_OF_ARGUMENTS_DOESNT_MATCH. Error catalog grows with the new emission surface: BAD_ARGUMENTS substrings for engine-arg / PK overlap, decimal "Too many digits" + ARGUMENT_OUT_OF_BOUND for over-magnitude decimal literals (also clamped at emission), and Date / DateTime parse failures. Smoke run on live ClickHouse 26.5.1.111: 4324 queries / 98% successful-statement rate on TLPWhere with all v2 flags on, zero uncaught AssertionErrors across TLPWhere / QccCache / PQS.
Ten profile/fix iterations against clickhouse-server:head (26.5.1.779), 6
sqlancer threads, --oracle TLPWhere. Two-run 90s A/B with stock HEAD vs.
this branch: mean CH-side stmts 13,942 -> 15,581 (+11.8%); thread-death
rate 2/3 runs -> 1/3 runs.
ClickHouseErrors.java
Absorb TOO_LARGE_STRING_SIZE (Code 131). CH 26.5 surfaces oversized
FixedString literals with that wrapper rather than FIXED_STRING; without
it the generator's expected-errors filter let two of six threads die on
the very first INSERT batch (~33% capacity gone at t=0).
ClickHouseProvider.java
- DROP DATABASE ... SYNC + drop the two Thread.sleep(1000) calls in
createDatabase. The sleeps date to the 2020 module rewrite with no
justifying comment; modern Atomic-engine rename-on-drop makes them
unnecessary, and they were costing ~84 thread-seconds per 1080s budget.
- Move clickhouse_setting_max_execution_time / allow_experimental_analyzer
/ allow_suspicious_low_cardinality_types off the JDBC URL into SET
statements at connect time. http_response_buffer_size and
wait_end_of_query stay on the URL: the former is consumed at the moment
the server commits to a chunked response (re-SETting too late) and the
latter is HTTP-only (SET returns UNKNOWN_SETTING).
ComparatorHelper.java
Replace replaceAll("[\\.]0+$", "") in getResultSetFirstColumnAsString
with a constant-time char-from-end scan. Was ~24% of all sqlancer-side
CPU samples because it ran on every row of every oracle result set
(Pattern.compile + Matcher.<init> showed up 400+ times per profile).
Main.java
Drop per-write flush in StateLogger.write. The current-database log
file is closed (and therefore implicitly flushed) on both success and
AssertionError paths of DBMSExecutor.run, so reproducer integrity is
preserved -- the only behavior we lose is the very last few queries
being durable if the JVM is hard-killed before the finally-block close
runs.
common/log/SQLLoggableFactory.java
Fast-path createLoggable for newline-free input (the overwhelming
common case): a single LoggedString allocation when the query already
ends with ';' and has no suffix, otherwise one StringBuilder pass that
escapes \n / \r inline instead of two separate String.replace passes.
Each logged statement funnels through here twice when
--log-execution-time=true.
common/query/SQLQueryAdapter.java
setEscapeProcessing(false) on every Statement / PreparedStatement.
SQLancer never emits JDBC escape syntax ({fn ...}, {call ...},
{escape '\'}), so the driver's escape-to-native preprocessor is pure
overhead -- worth ~70 String.replaceAll samples per profile on
clickhouse-jdbc 0.9.8. Wrapped in try/catch in case a driver throws
on the setter.
common/visitor/ToStringVisitor.java
Pre-size the shared StringBuilder to 512 chars. Observed sqlancer SQL
is p50=49, p90=176, p99=223, max=769; the default capacity of 16 was
forcing seven grow-and-arraycopy cycles per AST -> SQL render.
Captures the non-obvious traps from the perf workstream so a future
session (human or agent) doesn't have to rediscover them:
- ClickHouse head container needs CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
+ CLICKHOUSE_SKIP_USER_SETUP=1 or the default user has no network
access ("password is incorrect" surface error).
- wait_end_of_query is HTTP-only; http_response_buffer_size is consumed
at the moment the server commits to a chunked response, so both have
to stay on the JDBC URL. Other clickhouse_setting_* params are safe to
apply via SET after connect.
- mvn package must include -Djacoco.skip=true under JDK 25 (JaCoCo
0.8.12 chokes on class file major version 69).
- Sqlancer's CLI parser is positional: global options before the DBMS
subcommand, DBMS options after, otherwise jcommander rejects them.
- This shell ships a poisoned JAVA_TOOL_OPTIONS / ASAN_OPTIONS that
every java/mvn/jfr call has to unset.
…solidation) Correctness (from adversarial review of the two prior commits): - JoinUseNulls: restrict to exactly ONE join clause. With chained joins, a later ON can reference a column an earlier OUTER/ANTI join null-extended, and that predicate legitimately evaluates differently under join_use_nulls=0 vs 1 (fill 0 equi-matches a genuine 0 key; fill NULL never matches; IS NULL conjuncts flip) -- documented behavior that would have been reported as a bug. Same dropped-column family as the #107073 gate. - ExtendedDatetime: SYSTEM STOP MERGES on the private table for the gated non-merged pre-1970 arm -- a background merge would otherwise re-form the filed #106419 shape behind the gate. CLAUDE.md #106419 entry now records the gate (removal condition). - SEMR: drop use_skip_indexes_if_final_exact_mode -- its own doc says 0 returns approximate FINAL results by contract, i.e. result-CHANGING, not result-preserving. use_skip_indexes_if_final stays (safe with exact_mode at its server default). - JoinUseNulls + QueryCache: compare with ComparisonMode.MULTISET, not the SET default -- a wrong fill colliding with a genuine value keeps the value-set equal and only the multiset catches it. Consolidation: - ClickHouseTypeFilters.isExactIntegerFamily: one shared exact-integer predicate (float noise rule) replacing three per-oracle copies (JoinUseNulls, AggregateState- Roundtrip, WindowEquivalence). - JoinAlgorithmOracle's CAPS / isAlgorithmDeterministic / resource-cap tolerances made package-visible and reused by JoinUseNulls instead of duplicated. - query_plan_use_logical_join_step tombstone moved to the catalog's exclusion block; enable_lazy_columns_replication comment rewritten to stand alone.
…; #107073 gate is permanent Smoke-run reclassification (the settings plan's verification discipline working as designed): the first focused smoke (461k queries / 15 min) surfaced 3 SEMR/SEMRMulti cardinality mismatches, all pinned to query_plan_convert_join_to_in. Its own catalog doc says "May cause wrong results with non-ANY JOINs (e.g. ALL JOINs which is the default)" -- converting an ALL JOIN to IN collapses row multiplicity on duplicate keys (confirmed by hand on head 26.6.1.634: count() 13 vs 5 on a 5-row dup-key self-join). Documented result-CHANGING, not a CH bug; removed with a do-not-re-add note and a test pin. #107073 closure fallout: upstream closed the JoinReorder ANTI/SEMI finding as by-design -- columns read from a SEMI/ANTI join's eliminated side are ANY-like (filled from whichever matching row arrives first), so any legal plan change flips the result. The JoinReorder dropped-key-ref gate is therefore a PERMANENT soundness rule now, not a temporary known-bug pin: oracle comments, the --join-reorder-allow-dropped-key-ref description, and CLAUDE.md updated (open-bugs entry replaced by a triage-section note). Real-use-case demos verified on head with default settings only: tmp/107073-real-use-case.md.
Remove every comment form (// line, /* */ block, /** */ Javadoc) from all 1208 .java files under src/ and test/ via a lexical state-machine stripper that preserves string/char literals and text blocks. Whole-line comments collapse to blank lines, then runs of blank lines coalesce to one. 541 files changed. Verified `mvn compile test-compile` succeeds (exit 0).
…uery-validity fixes
Operational hardening:
- run-sqlancer.sh now ALWAYS pulls clickhouse/clickhouse-server:head (no opt-out;
--no-pull is a deprecated warn-only no-op). 'head' is mutable and advances ~daily
while CH retains no per-build version tags, so a stale local image silently fuzzes
an old build and makes findings unreproducible. The resolved SELECT version() and
image RepoDigest are now stamped at the top of every logs/runs/sqlancer-*.log and
printed in the run summary so reproducers stay attributable after head moves.
- run-1h-all-oracles.sh / run-per-oracle-validation.sh drop their --no-pull usage.
- CLAUDE.md: document the always-pull policy and add the no-comments-in-code banner.
TLP oracle query validity (per the per-oracle query_log audit):
- TLPAggregate: drop the ORDER-BY-over-bare-columns emission on a no-GROUP-BY
aggregate query — it produced ~15% NOT_AN_AGGREGATE invalid queries (wasted, not
false positives) with no oracle value.
- TLPHaving: project the GROUP BY keys plus exact MIN/MAX/SUM aggregates instead of
arbitrary aggregate fetch columns, mirroring the TLPGroupBy soundness fix so row
identity tracks group identity.
Tooling:
- Add .claude/collect-oracle-validity.sh + oracle-validity-workflow.js: per-oracle
query-validity collection (truncate query_log, run each oracle briefly, dump the
error-code distribution + sample failing-query texts) and a workflow that judges
each oracle's generated-query validity from those dumps.
- .gitignore: ignore database*.lck (HSQLDB scratch lock files), matching the
existing database*.{log,properties,script} entries.
Extends the existing TextIndexLike + JsonSkipIndex coverage toward the full ClickHouse text-index surface. Validated on dev-vm head 26.6.1.734: a 1-hour full-fleet run (167k queries, all ~45 oracles) produced 0 false positives from any of these units. Soundness rules below were established empirically there. ClickHouse#3 Pure-optimization toggle arm: TextIndexLike gains a DIRECT_READ_OFF arm (query_plan_direct_read_from_text_index=0, query_plan_text_index_add_hint=0). A pure optimization toggle must never change results; UNKNOWN_SETTING is already tolerated so it degrades on older builds. ClickHouse#4 Merge + delete-masked-part topology: TextIndexLike optionally issues a synchronous lightweight DELETE (lightweight_deletes_sync=2) and/or OPTIMIZE TABLE ... FINAL after the inserts, recording topology in assertion messages. The Java ground truth now counts over the live (non-deleted) rows. Targets the #107309 class (index honoring a delete-masked part). ClickHouse#6 General-fleet predicate injection: generateTextSearchPredicate emits startsWith / endsWith / multiSearchAny over plain String columns from a fixed vocabulary, gated by --text-search-predicate-emission (default on). These three are the functions proven index==scan-equivalent across ALL tokenizers (incl. array). hasToken/hasAllTokens/hasAnyTokens are deliberately NOT emitted here: hasToken diverges index-vs-scan on the array tokenizer (whole-value token) and hasAllTokens/hasAnyTokens diverge on ngrams (the multi-token needle's space-spanning grams) -- by-design tokenizer-semantics differences, not bugs. Sound for NoREC because renderSkipIndex never carries a preprocessor. #7a renderSkipIndex breadth: full tokenizer matrix (splitByNonAlpha, ngrams(N), array, asciiCJK, splitByString, sparseGrams) and text() eligibility extended to FixedString / LowCardinality(String) / Nullable(String) / Array(String) (bare) and Map(String,*) via mapKeys/mapValues expression indexes. New ClickHouseErrors.getTextIndexErrors() tolerances wired into table creation. ClickHouse#5 TextIndexPreprocessor oracle (NEW): builds a private table with INDEX(s) preprocessor=lower(s), then asserts that a forced direct read (force_data_skipping_indices + direct_read=1, add_hint=0) over a mixed-case corpus equals a Java lower()-token-membership ground truth. (The documented INDEX(lower(s)) equivalent form cannot be force-engaged for hasToken(s,...) on 26.6.1.734 -- it raises INDEX_NOT_USED -- so the ground-truth comparison replaces a second-table comparison.) #7b TextIndexContainer oracle (NEW): Array(String) + array tokenizer with exact has/hasAny/hasAll List ground truth, and Map(String,String) key-vs-value isolation via mapContainsKey/mapContainsValue, across DEFAULT / ignore_data_skipping_indices / use_skip_indexes=0 arms. ALTER lifecycle: ClickHouseAlterGenerator gains ADD_INDEX / MATERIALIZE_INDEX / CLEAR_INDEX / DROP_INDEX kinds. New ClickHouseTextIndexLifecycle oracle asserts CREATE-with-index == (index-free + ALTER ADD INDEX + MATERIALIZE INDEX SETTINGS mutations_sync=2) == use_skip_indexes=0 scan over a predicate battery (LIKE + hasToken always; hasAllTokens/hasAnyTokens only on the splitByNonAlpha arm, where they are index==scan-equivalent), exercising the distinct MATERIALIZE-over-historical-parts code path. All three new oracles registered in ClickHouseOracleFactory and the run / validity oracle-name lists. All oracles use private per-iteration tables, fixed token vocabulary, and integer-key/count comparisons to stay clear of the repo's known false-positive families.
…ules Document the 4 FTS oracles, the general-fleet predicate injection, and the non-obvious index-vs-scan tokenizer-soundness rules for text-search functions (startsWith/endsWith/multiSearchAny safe on all tokenizers; hasToken diverges on array; hasAllTokens/hasAnyTokens diverge on ngrams). Validated 0 FP over a 1h full-fleet dev-vm run on head 26.6.1.734.
ClickHouse#107186 (OPEN, confirmed bug): hasToken / hasAllTokens / hasAnyTokens return WRONG results with default settings via exact direct read from a text index whose tokenizer != splitByNonAlpha (asciiCJK / array / ngrams / splitByString / sparseGrams) or that carries a preprocessor. The exact direct read answers from the index posting lists using the INDEX tokenizer, not hasToken's fixed splitByNonAlpha semantics; since query_plan_direct_read_from_text_index=1 is the default, queries are wrong with no special settings. This is exactly the index-vs-scan divergence the first FTS pass MISCLASSIFIED as "by-design / results-may-differ" and engineered around. It is a real wrong result. New ClickHouseTextIndexDirectReadOracle builds a private table per iteration with one of splitByNonAlpha(control) / asciiCJK / array / ngrams / sparseGrams / splitByString / preprocessor=lower(s), then asserts hasToken-family predicates give identical keys under default (direct_read=1) vs use_skip_indexes=0. Per the operator's choice it is DEFAULT-ON and fires every run (not gated like #106419) to also surface relatives/regressions in the direct-read code path; triage by the "#107186" assertion string. Validated on dev-vm head 26.6.1.735: all six non-control scenarios fire (incl. the issue's asciiCJK repro), the SPLIT_CONTROL arm stays clean. When #107186 is fixed the oracle falls silent (SPLIT_CONTROL-only firing would then indicate a NEW bug). Removes ClickHouseTextIndexPreprocessorOracle, which wrongly codified the buggy direct-read answer (lower()-token-membership) as its expected ground truth and would have broken when #107186 is fixed; its preprocessor coverage is subsumed by the PREPROCESSOR_LOWER scenario here. ClickHouse#6 (generateTextSearchPredicate) keeps only startsWith/endsWith/multiSearchAny (index==scan on all tokenizers). Registered in ClickHouseOracleFactory + the run/validity oracle-name lists; #107186 documented in CLAUDE.md (filed-bugs list + FTS section).
…ide-int types + EET roundtrip identities) Implements the plan's Phase-1 P0 fast wins from docs/plans/2026-06-13-001-feat-clickhouse-coverage-backlog-30-ideas-plan.md. New differential / ground-truth oracles (each default-on, gated by a --flag, self-contained, soundness-checklist-compliant): - ClickHouse#2 PrewhereEquivalence WHERE == PREWHERE == optimize_move_to_prewhere=0 (multiset) - ClickHouse#3 ReadInOrderToggle optimize_read_in_order/aggregation_in_order on==off - ClickHouse#4 CountOptimization trivial/implicit/use_projections on==off + countIf cross-check + GROUP-BY-key count arm (hardens #106573, #106125) - ClickHouse#5 LazyMaterializationToggle query_plan_optimize_lazy_materialization on==off (positional) - sqlancer#12 ReplacingDedup ReplacingMergeTree(ver) FINAL == argMax(val,ver) GROUP BY key - sqlancer#19 QuantileConsistency quantileExact==medianExact, monotone-in-level, Low<=Exact<=High - sqlancer#20 UniqExactness uniqExact == count(DISTINCT) == length(groupUniqArray) - sqlancer#21 ArgExtremum argMax / arraySort(groupArray) / groupArraySorted vs Java truth - sqlancer#28 MaterializedColumn MATERIALIZED/ALIAS col == defining expr (single-snapshot) Generator/catalog: - sqlancer#16 emit signed Int128/Int256 + Decimal256 (Decimal precision 39..76); Interval column emission deliberately NOT added (ClickHouse forbids storing Interval in tables); ClickHouseTypeParser gains an Interval read-back branch for free. - sqlancer#23 EET roundtrip identities: unhex(hex), base64Decode(base64Encode), tryBase64Decode(base64Encode), toIPv4(IPv4NumToString(toUInt32)), toIPv6(IPv6NumToString) -- each type-gated; +7 unit-test cases. Wired into ClickHouseOracleFactory, ClickHouseOptions (--*-oracle flags) and .claude/run-sqlancer.sh ALL_ORACLES. mvn compile green; EETIdentitiesTest 20/20. Probe-on-head items (recorded for dev-vm validation): query_plan_optimize_lazy_materialization, read_in_order_use_buffering, optimize_trivial_count_query family, groupArraySorted/array text format, IPv6 IPv4-mapped roundtrip.
…ickHouse#6,7,8,9,10) Five self-contained query-structure oracles (default-on, --flag gated, raw-SQL, no general-fleet hot-path edits -- that emission is a separate gated pass): - ClickHouse#6 GroupingDecomposition GROUP BY WITH ROLLUP: detail(GROUPING=0)==plain GROUP BY, super-agg(GROUPING=1)==grand count(), sum(group counts)==grand - ClickHouse#7 LimitRanking LIMIT a,b==LIMIT b OFFSET a; LIMIT n prefix-of LIMIT n WITH TIES; LIMIT n BY k <= n rows/key (deterministic total ORDER BY) - ClickHouse#8 WindowFrame default frame==explicit RANGE/ROWS UNBOUNDED..CURRENT; lagInFrame ==one-preceding frame (unique-key fixture, single-snapshot compares) - ClickHouse#9 SemiJoinRewrite LEFT SEMI==IN, LEFT ANTI==NOT IN (preserved-side projection only per #107073), LEFT ANY JOIN cardinality==left count - sqlancer#10 ColumnTransformer * EXCEPT/APPLY/COLUMNS(regex)==explicit list; DISTINCT ON cardinality Wired into factory/options/run-sqlancer.sh; mvn compile green. Probe-on-head (dev-vm): lagInFrame/leadInFrame, empty-frame NULL boundary, WITH TIES/LIMIT BY syntax.
…les (ClickHouse#1,11,13,14) Four self-contained engine oracles (default-on, --flag gated, raw-SQL fixtures, young-engine-absence tolerated to IgnoreMe; no pickEngine hot-path edits): - ClickHouse#1 EngineEquivalence MergeTree mirror == Memory/TinyLog/StripeLog/Log (multiset, identical inserted rows, same read; exact-multiset engines only) - sqlancer#11 CoalescingFinal CoalescingMergeTree FINAL == argMaxIf(col,seq,isNotNull(col)) last-non-null ground truth (unique seq, merge-formed) - sqlancer#13 JoinGetSet x IN Set-engine == x IN (subquery); joinGet(Join engine) == ANY LEFT JOIN - sqlancer#14 RemoteLocalEquivalence remote('127.0.0.1',db,t) == local read (single-node distributed path) + numbers(n) ground truth Wired into factory/options/run-sqlancer.sh; mvn compile green. Probe-on-head (dev-vm): CoalescingMergeTree existence + last-non-null semantics, Set/Join engine + joinGet signature, remote() no-credential self-connect, toString(Array) text format.
…ate oracles (sqlancer#15,17,18,22) Four self-contained, reader-safe oracles (every projected wire value toString-wrapped so the client-v2 RowBinary reader never sees a raw container/Variant/AggregateFunction column; default-on, --flag gated): - sqlancer#15 MapTupleContainer Map/Tuple/Array extractions == Java ground truth (private fixture) - sqlancer#17 GeoMetamorphic pointInPolygon/polygonArea/greatCircleDistance/intersection metamorphic identities, integer coords, 1e-6 float tolerance - sqlancer#18 VariantSubcolumn Variant/Dynamic/JSON subcolumn roundtrip (toString-wrapped reads only; raw-column projection emission stays gated off per plan) - sqlancer#22 AggregateStateExpansion finalizeAggregation(arrayReduce('<agg>State',groupArray(x)))==<agg>(x) for sum/min/max/uniqExact/quantileExact/groupArray + AggregatingMergeTree arm Young-type unavailability (Variant/JSON/Dynamic/geo) tolerated to IgnoreMe. Wired into factory/options/run-sqlancer.sh; mvn compile green. Probe-on-head: geo function names (polygonAreaCartesian etc.), Variant subcolumn syntax (v.Int64 vs variantElement), arrayReduce/finalizeAggregation signatures.
…es (sqlancer#24,25,26,27,29,30) Seven self-contained oracles (default-on, --flag gated, feature-absence tolerated): - sqlancer#24 SequenceFunnel windowFunnel/sequenceCount/sequenceMatch/retention == Java truth (tiny deterministic fixture, far-from-boundary timestamps) - sqlancer#25 PartitionLifecycle DETACH+ATTACH/DROP/REPLACE/MOVE PARTITION row-set invariants (SYSTEM STOP MERGES pins topology) - sqlancer#26 AlterModifyConsistency data-preserving ALTER MODIFY COLUMN/CODEC/TTL/SETTING + MATERIALIZE must not change the visible multiset (modulo pre-applied widen cast) - sqlancer#27 TtlDeterminism TTL DELETE survivors == non-expired bucket; far-from-now() date buckets keep the result wall-clock-independent - sqlancer#29 InsertDedup identical block re-insert leaves count unchanged (insert_deduplicate default-on); distinct block grows; optional async arm - sqlancer#30 TokenBf hasToken/=/IN with tokenbf_v1 index == use_skip_indexes=0 scan - sqlancer#30 VectorIndexRecall vector_similarity HNSW top-1 == brute-force top-1 + top-k containment (never exact set-equality for ANN) Completes all 30 plan ideas across 5 waves: 29 new oracles + Int128/256/Decimal256 emission + EET roundtrip identities. ALL_ORACLES now 76 tokens; mvn package green (3.9M jar). Probe-on-head (dev-vm): tokenbf_v1/vector_similarity grammar, windowFunnel/retention signatures, TTL materialize.
- sqlancer#22 AggregateStateExpansion: AggregatingMergeTree arm used finalizeAggregation(sumMerge(s)) which double-finalizes (Code 43 ILLEGAL_TYPE_OF_ARGUMENT); sumMerge(s)/uniqExactMerge(u) already return the finalized scalar. Verified on head: sumMerge=10, uniqExactMerge=3. - ClickHouseProvider: pin allow_experimental_{variant,dynamic,json}_type + allow_experimental_vector_similarity_index per connection so the VariantSubcolumn (sqlancer#18) and VectorIndexRecall (sqlancer#30) fixtures CREATE reliably (the oracle-issued SET could land on a different pooled connection than the CREATE). All four settings confirmed to exist on head. Probe (head 26.6.1.743) confirmed all other 27 new-oracle constructs work as written: Int128/Int256/Decimal(P,S), geo fns, windowFunnel/sequenceCount/retention, lagInFrame, WITH ROLLUP/GROUPING/TOTALS, LIMIT WITH TIES/BY, *EXCEPT/APPLY/COLUMNS, SEMI/ANTI/ANY, CoalescingMergeTree, Set/Join/joinGet, remote(), tokenbf_v1, vector_similarity, TTL DELETE, unhex/base64/IPv4/IPv6 roundtrips.
…acle smoke) Smoke of the 29 new oracles surfaced 6 firing; 23 clean. Fixes: - UniqExactness (12 FP): excluded Nullable/LowCardinality(Nullable) columns -- uniqExact, count(DISTINCT), groupUniqArray, and tuple-distinct handle NULL differently, so the identity does not hold on Nullable columns. - LimitRanking (24 FP): WITH TIES positional-prefix check replaced with sub-multiset containment -- rows tied on the ORDER BY key may legally reorder between the plain and WITH TIES queries, so only containment (plain subset-of withTies) is sound. - MaterializedColumn (5 FP): recompute now CASTs the defining expression to the column's declared type (system.columns.type); a MATERIALIZED Int32 stored in a UInt64 column had the expr re-evaluated in the wrong type. - ColumnTransformer (3 FP): DISTINCT ON cardinality compared against a NULL-inclusive GROUP BY key count (DISTINCT ON keeps the NULL group; count(DISTINCT k) drops it). - PrewhereEquivalence (1 FP): predicate wrapped in explicit (...)!=0 so a float/NaN predicate (c0/c0) is coerced to a deterministic UInt8 identically in WHERE and PREWHERE, removing the implicit float-truthiness divergence (likely a real CH quirk; re-surfaces only if it diverges even on an explicit boolean). - SequenceFunnel (2047 FP): DEFAULT OFF. windowFunnel monotonicity asserted backwards (it is non-decreasing in step count) and the exact-value Java models are unvalidated; needs rework + 1h-clean re-validation before re-enabling. 23/29 new oracles were clean in the smoke. mvn compile green.
…residuals) Re-smoke (SequenceFunnel off) surfaced InsertDedup at 1093 FP (latent: SequenceFunnel had hogged worker deaths in round 1). Root causes + fixes: - InsertDedup (1093 FP): non-replicated MergeTree does NOT deduplicate inserts by default, so re-inserting an identical block doubled the count. Added SETTINGS non_replicated_deduplication_window=1000 to the fixture (verified on head: plain MergeTree count=6 vs window-enabled count=3 for a double-inserted 3-row block). - LimitRanking (2 FP): the only new oracle missing ClickHouseErrors.addExpectedExpressionErrors; a pre-existing generator/ALTER part-read error (Code 32 ATTEMPT_TO_READ_AFTER_EOF, 'Cannot parse Int32 from String' -- already fleet-tolerated) reached the assertion. Added the expression-error set so it no-ops like the rest of the fleet. - PrewhereEquivalence (2 FP): logically sound (clean-table probe: WHERE==PREWHERE== move_to_prewhere=0 = 8 rows); the 10-vs-5 came from a persistent corrupt part (same Code-32 ALTER-rewrite class). Added a re-read-on-mismatch stability guard: a mismatch must reproduce identically on a second read or the iteration is IgnoreMe (unstable/corrupt-part read). mvn compile green.
…-op + over-tolerance) 6-agent review of the new oracles surfaced latent issues an empirical 0-FP smoke would miss: - JoinGetSet (HIGH, real FP): joinGet returns the column default on a MISSING key while any(v) WHERE k=missing returns empty -> divergence. Probe only present keys (present.keySet()). - ReadInOrderToggle + LazyMaterializationToggle (MEDIUM, real FP): 0.0/-0.0 tie in ORDER BY renders as distinct '0'/'-0' under positional compare and the plan toggle can flip tied-row order. Excluded Float32/Float64 from the ORDER BY key (LazyMat: from projection too). - InsertDedup async arm (no-op): standalone SET async_insert=1 does not persist on the stateless client-v2 HTTP transport -> the arm silently tested sync again. Inlined the settings on the async INSERT (SETTINGS async_insert=1, wait_for_async_insert=1). - TokenBf (no-op): s='word'/s IN(...) never matched multi-token rows. Needles now drawn from real inserted row values so equality/IN arms exercise the bloom index against live rows. - GeoMetamorphic (low): self-intersection area equality uses a relative tolerance. - PrewhereEquivalence / WindowFrame (over-tolerance): dropped the bare 'PREWHERE'/'frame' substrings that could mask a real crash error; kept ILLEGAL_PREWHERE / 'Window frame'. mvn compile green. (Pre-existing repo checkstyle violations in h2/tidb/sqlite3 are unrelated.)
All 30 ideas implemented as 29 new oracles + Int128/Int256/Decimal256 emission + EET roundtrip identities across 5 waves; validated 0-false-positive from the new oracles over a 3h / 535849-query --oracles all run on head 26.6.1.764 (all 6221 firing reproducers are the by-design #107186 TextIndexDirectRead detector). SequenceFunnel (sqlancer#24) shipped default-off pending windowFunnel ground-truth rework.
Replace the single processors_profile_log TTL cap with a combined system_logs_disabled.xml that removes (remove=remove) the heavy non-diagnostic system logs entirely: metric_log, asynchronous_metric_log, query_metric_log, processors_profile_log, query_thread_log, query_views_log, opentelemetry_span_log, latency_log, blob_storage_log, backup_log. query_log/text_log/part_log/error_log/crash_log stay enabled for reproducer triage. Mirrors the existing trace_log_disabled.xml mechanism. Mount wired into run-sqlancer.sh and collect-oracle-validity.sh; CLAUDE.md config-set note updated (former system_log_ttl.xml removed, processors_profile_log now fully removed rather than TTL-capped).
SAMPLE (Phase 1):
- Widen SAMPLE BY generation: a bare UInt* column in a composite ORDER BY's
primary-key prefix now makes the table sample-eligible (was: only when ORDER
BY is a single bare UInt* column). New bareUnsignedIntColumnName/firstBareUnsignedIntIn
helpers in ClickHouseTableGenerator.
- Record sampling_key on ClickHouseTable (read from system.tables); add
getSamplingKey()/hasSamplingKey(). fromConnection now fetches name+engine+sampling_key
in one query (getTableMeta).
- New ClickHouseSampleClauseOracle: sound, dedicated, never in the general fleet.
Arms: SAMPLE 1 == full read (identity), SAMPLE k subset-of full, SAMPLE 1/k
OFFSET i/k tiles each subset-of full; gated statistical _sample_factor arm
(--sample-factor-arm, default off). Restricted to plain MergeTree (stable row
multiset); self-creates a fixture when no schema table is sampleable.
Distributed (Phase 2):
- New ClickHouseDistributedTableOracle: self-contained local MergeTree +
Distributed('default', db, local) wrapper; asserts read equivalence (multiset),
exact-integer aggregation equivalence (count/sum/min/max, GROUP BY UInt key),
and foreground INSERT routing (count delta). Distributed stays OUT of the
general engine pool.
Both registered in ClickHouseOracleFactory + ClickHouseOptions (default on) and
added to run-sqlancer.sh ALL_ORACLES.
dev-vm smoke surfaced 16k Code 27 'Cannot parse input: expected ( before SETTINGS' from the insert-routing arm: ClickHouse requires the SETTINGS clause BEFORE the VALUES keyword in INSERT. Because Code 27 is globally tolerated the arm silently no-opped (skipped its count-delta assertion) rather than failing. Move SETTINGS distributed_foreground_insert=1 ahead of VALUES so the arm actually executes and the routing assertion runs.
…BE, correlated subquery, bit/array/string/datetime ground-truth, AggregateFunction-column, ARRAY JOIN, window-frame, WITH FILL)
…); PasteJoin compare via concat not whole-tuple text
…efault 0 not NULL; WithFill project raw sort column x (fill populates sort col, not derived toString(x))
…onfig; redundant with database*.log reproducers, pure disk overhead)
…validation sweep Per-oracle validation sweep (all 91 oracles, dev-vm, head 26.6.1.950) surfaced oracle-soundness false positives; root-caused and fixed: - JoinAlgorithm: skip Float32/Float64 join ON-keys in generateJoinClause. Float-equality joins are not invariant across join algorithms (hash vs partial_merge gave 13 vs 19 rows; integer-cast key -> all algorithms 127, proven on fresh CH). Validated 0 repros after fix. - TLPHaving: aggregate argument is now a bare integer column, so SUM/MIN/MAX are exact and order-invariant and the HAVING predicate evaluates identically across the p / NOT p / p IS NULL partition re-executions (float aggregates made p non-deterministic -> partition over-count). Validated 0 repros. - DDL toleration (ClickHouseErrors): tolerate valid rejections of fuzzer- generated invalid DDL that were killing workers (CANNOT_UPDATE_COLUMN on materialized cols, Cannot drop/incompatible statistics, impossible MODIFY/convert, no-such-column in ALTER UPDATE). Validated clean: SchemaRoundtrip, SortedUnionLimitBy, StatsToggle, DynamicSubcolumn. Harness (collect-oracle-validity.sh): full 91-oracle list (branch oracles first), env-configurable DUR/THREADS/HEAP/CH sizing, per-oracle reproducer preservation, timeout -k SIGKILL fallback + RESUME guard (a hung JVM, e.g. MutationAnalyzer, previously stalled the whole sweep).
…Equivalence oracles Closes the three coverage gaps the ~95-oracle suite genuinely lacked. P0 SettingFlip: generic data-driven catalog of 23 result-neutral settings (optimize_read_in_order / query_plan_* / compile_* / max_threads / group_by_two_level_threshold / external_group_by / ...) flipped between two values over a ProjectionToggle-style integer-aggregate read; inherits that oracle's float-tolerant SET comparison. Deliberately excludes optimize_use_implicit_projections (known #106573) and join-reorder settings (by-design non-determinism, #107073). Adding a setting = one row in the catalog. P1 ConcurrentMutation: the first true-concurrency oracle (all others are single-snapshot). Background threads each open their own ClickHouseClientV2Transport and hammer multiset-preserving churn (OPTIMIZE FINAL / ALTER DELETE WHERE 0 SETTINGS mutations_sync=0 / concurrent SELECTs) on a private multi-part MergeTree while the main connection re-reads and asserts every sample equals the pre-churn baseline. Reaches the read-vs-merge/mutation race class. Churn uses a local step counter, never the non-thread-safe shared Randomly; transient tolerated read errors are skipped (getResultSetFirstColumnAsString throws IgnoreMe, never returns partial rows). P2 LowCardinalityEquivalence: private fixture with paired plain/LowCardinality columns (Int32, String, Nullable(Int32), FixedString(4)) holding identical values; any read (row projection / GROUP BY / uniqExact / predicate) over the plain columns must equal the same read over the LowCardinality twins. All three are gated by default-true Options flags (checked at the top of check()), wired into ClickHouseOracleFactory, and appended to ALL_ORACLES in run-sqlancer.sh. The self-contained oracles log their CREATE/INSERT (and P1 a record of the off-connection churn ops) so any future finding is replayable. Also removes the extended-datetime-known-overflow-arm gate (ClickHouse#106419 is fixed on head); coupled to the ClickHouseOptions change so they ship together. Validated 0-FP on dev-vm head 26.6.1.982: 15-min run of the three oracles, 47,619 queries, exit 0, 0 reproducers, 0 thread deaths, 99% successful statements.
…6426/#106262/#106124) All four are CLOSED/FIXED upstream (per the "open only" rule in this doc): #106262 fixed 2026-06-05, #106426 fixed 2026-06-12, #106124 fixed 2026-06-16 (PR #106136), #106419 fixed (ExtendedDatetime gate removed in 9a8469f).
…PHEMERAL columns Indexes (ClickHouseTableGenerator): - add tokenbf_v1 skip index for String columns - broaden minmax from Int32/String to all comparable scalar columns (isBareKeyColumn) — minmax is an exact index, sound on any orderable type - vector_similarity intentionally left oracle-only (ANN is approximate and diverges from a full scan, an oracle false-positive source) MergeTree part-size / merge SETTINGS (new renderMergeTreeSettings): - min_bytes/rows_for_wide_part (wide vs compact part storage) - min_bytes/rows_for_full_part_storage (full vs packed part storage) - index_granularity_bytes, enable_vertical_merge_algorithm + vertical_merge_algorithm_min_rows/columns_to_activate, merge_max_block_size, min/max_compress_block_size (plus the pre-existing knobs) - physical-layout only, no result-semantics change → no oracle FP risk EPHEMERAL columns: - new mutually-exclusive column kind in ClickHouseColumnBuilder - ClickHouseSchema reader excludes EPHEMERAL columns from the queryable set (with an all-ephemeral fallback) since SELECT on them errors Materialized columns and aliases were already supported. Validated on dev-vm head 26.6.1, full fleet (minus TextIndexDirectRead): 205,545 queries, 0 reproducers, 0 threads shut down, 98% statement success. All new constructs confirmed firing in query_log.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Transforms the ClickHouse provider from a minimal port (6 TLP/NoREC oracles, a flat two-type schema generator) into a wrong-result-focused fuzzing suite: 31 oracles, a recursive type-system ADT, a much wider generator (types, DDL, joins, functions, windows, CTEs, dictionaries), and the client-v2/RowBinary transport.
This is part of an ongoing investigation to improve result-correctness fuzzing of ClickHouse with SQLancer, following the wrong-result bug analysis in clickhouse-private#58385.
Scope: 119 files, ~+15.5k/−1k lines. Validated by repeated multi-hour all-oracle runs on ClickHouse
head(latest: 26.6.x); the suite has already surfaced several genuine server bugs (see Bugs found below).Oracles (31)
Existing TLP / NoREC —
TLPWhere,TLPDistinct,TLPGroupBy,TLPAggregate,TLPHaving,NoREC. Hardened: a multiset-aware comparator and TLP fixes (project group-by keys not arbitrary fetch columns; drop inner GROUP BY in TLPAggregate) removed whole false-positive classes.Paper oracles
IS NULL), containment viaINTERSECT.EstCard(Q') ≤ EstCard(Q)read fromEXPLAIN ESTIMATE; query never executed.CASE).Differential / setting-equivalence (same data, two ways, assert equal)
compile_expressions, analyzer,optimize_move_to_prewhere, filter pushdown, …) and diff.hash/partial_merge/grace_hash.materialize()+use_skip_indexes=0to disable granule pruning, diffed against the pruned scan.accurateCastvsaccurateCastOrNullagreement on non-NULL rows.NOT NULLsurvivesdata_type_default_nullable={0,1}.USING pfilters identically to explicitWHERE p.count() … FINALequals count afterOPTIMIZE … FINAL.Memory/Logtemp table and diff against the inline query. Targets the optimizer-rule-disabled-by-materialization wrong-result class (CH #106080/#106082/#106083/#106084/#105717).Identity / equivalence
sumIf/countIf/… identities), WindowEquivalence (count() OVER ()==count(), …), AggregateStateRoundtrip (finalizeAggregation(arrayReduce('sumState', groupArray(c)))==sum(c)), DictGetVsJoin (dictGet==LEFT JOINsource), SortedUnionLimitBy, TableFunctionIN, DynamicSubcolumn.Type system (recursive ADT)
Replaces the flat
(ClickHouseDataType, String)model, which could not encode parameters (Decimal(9,2)collapsed toDecimal) or wrappers, so every oracle bailed on anything outside{Int32, String}.ClickHouseType— sealed ADT:Primitive(Kind)(Int8…Int256, UInt*, Float32/64, Bool, String, UUID, Date/Date32, Time/Time64, Enum8/16, Decimal, IPv4/6),Nullable(inner),LowCardinality(inner)withcanWraprules, andUnknown(raw)fallback.isNumeric,supportsLiteralEmission,hasNullSemantics), a never-throwing recursive-descentClickHouseTypeParser, and anUnsupportedsentinel that routes undefined coercions toIgnoreMeExceptioninstead ofAssertionError.Generator & SQL surface
ORDER BY/PARTITION BY/SAMPLE BYkeys with validity guards (no float-result partition keys; SAMPLE BY derived from the sort key), CODEC/Statistics/constraint emission.toNullable/LowCardinality(Nullable)), and richer JOINONclauses (= … AND … IS NULL).ALTERcolumn ops,ALTER UPDATE/DELETE+ lightweightDELETEwith mutation barriers, dictionaries, access/row-policy DDL.Transport & operational config
RowBinaryWithNamesAndTypes(replaces jdbc-v2); fixes UInt64→long overflow and DateTimeInstantexceptions in oracle row reads;guavapinned for the reader.max_result_rowscap (kills OOM-by-materialization across oracles) and always enable the analyzer..claude/clickhouse-config/, mounted into the test container):async_insertoff, andalter_sync=2/mutations_sync=2(inusers.d/, where<profiles>are actually loaded) so ALTER/mutations are synchronous and intra-iteration reads are deterministic.Validation
Driven by repeated all-oracle runs on CH
head(see.claude/run-sqlancer.sh+run-1h-all-oracles.sh), with a documented triage workflow that separates real wrong-result signal from known false-positive families (float-ULP, NaN coalescence, NaN-in-SUM). The new oracles each ship with soundness guards derived from triage; the latest 30-min/222k-query run had zero false positives attributable to the newest oracles.Bugs found
USINGexpression →LOGICAL_ERROR"Duplicate column name … in row policy actions output" (26.x regression, query-plan optimization).DISTINCTdoes not deduplicate NaN on AArch64 (fastops regression).LEFT ANTI JOIN: ON-equality projection collapses to default0when aWHEREis added (JOIN regression).__bitWrapperFuncinternal assertion on a FloatWHEREpredicate against aTYPE set(N)skip index.