<p>---
title: DDSQL Reference
description: &gt;-
  Complete reference for DDSQL syntax, data types, functions, operators, and
  statements for querying Datadog data with SQL.
breadcrumbs: Docs &gt; DDSQL Reference
---

# DDSQL Reference
Available for:
{% icon name="icon-ddsql" /%}
 DDSQL Editor | 
{% icon name="icon-notebook" /%}
 Notebooks 
## Overview{% #overview %}

DDSQL is SQL for Datadog data. It implements several standard SQL operations, such as `SELECT`, and allows queries against unstructured data. You can perform actions like getting exactly the data you want by writing your own `SELECT` statement, or querying tags as if they are standard table columns.

You can run DDSQL queries from AI agents using the [Datadog MCP Server](https://docs.datadoghq.com/mcp_server.md) `ddsql` toolset (Preview).

This documentation covers the SQL support available and includes:

- Syntax compatible with PostgreSQL
- Data types
- Type literals
- Arrays
- SQL functions
- Regular expressions
- Window functions
- JSON functions
- Network address functions
- Table functions
- Tags

{% image
   source="https://docs.dd-static.net/images/logs/workspace/sql_reference/sql_syntax_analysis_cell.0508237133cb7ed03597d35cb532ddd3.png?auto=format&amp;fit=max&amp;w=850 1x, https://docs.dd-static.net/images/logs/workspace/sql_reference/sql_syntax_analysis_cell.0508237133cb7ed03597d35cb532ddd3.png?auto=format&amp;fit=max&amp;w=850&amp;dpr=2 2x"
   alt="Example workspace cell with SQL syntax" /%}

## Syntax{% #syntax %}

The following SQL syntax is supported:

{% dl %}

{% dt %}
`SELECT (DISTINCT)` (DISTINCT: Optional)
{% /dt %}

{% dd %}
Retrieves rows from a database, with `DISTINCT` filtering out duplicate records.
```sql
SELECT DISTINCT customer_id
FROM orders 
```

{% /dd %}

{% dt %}
`JOIN`
{% /dt %}

{% dd %}
Combines rows from two or more tables based on a related column between them. Supports FULL JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN.
```sql
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id 
```

{% /dd %}

{% dt %}
`GROUP BY`
{% /dt %}

{% dd %}
Groups rows that have the same values in specified columns into summary rows.
```sql
SELECT product_id, SUM(quantity)
FROM sales
GROUP BY product_id 
```

{% /dd %}

{% dt %}
`||` (concat)
{% /dt %}

{% dd %}
Concatenates two or more strings together.
```sql
SELECT first_name || ' ' || last_name AS full_name
FROM employees 
```

{% /dd %}

{% dt %}
`WHERE` (Includes support for `LIKE`, `IN`, `ON`, `OR` filters)
{% /dt %}

{% dd %}
Filters records that meet a specified condition.
```sql
SELECT *
FROM employees
WHERE department = 'Sales' AND name LIKE 'J%' 
```

{% /dd %}

{% dt %}
`CASE`
{% /dt %}

{% dd %}
Provides conditional logic to return different values based on specified conditions.
```sql
SELECT order_id,
  CASE
    WHEN quantity &gt; 10 THEN 'Bulk Order'
    ELSE 'Standard Order'
  END AS order_type
FROM orders 
```

{% /dd %}

{% dt %}
`WINDOW`
{% /dt %}

{% dd %}
Performs a calculation across a set of table rows that are related to the current row.
```sql
SELECT
  timestamp,
  service_name,
  cpu_usage_percent,
  AVG(cpu_usage_percent) OVER (PARTITION BY service_name ORDER BY timestamp ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_cpu
FROM
  cpu_usage_data 
```

{% /dd %}

{% dt %}
`IS NULL` / `IS NOT NULL`
{% /dt %}

{% dd %}
Checks if a value is null or not null.
```sql
SELECT *
FROM orders
WHERE delivery_date IS NULL 
```

{% /dd %}

{% dt %}
`LIMIT`
{% /dt %}

{% dd %}
Specifies the maximum number of records to return.
```sql
SELECT *
FROM customers
LIMIT 10 
```

{% /dd %}

{% dt %}
`OFFSET`
{% /dt %}

{% dd %}
Skips a specified number of records before starting to return records from the query.
```sql
SELECT *
FROM employees
OFFSET 20 
```

{% /dd %}

{% dt %}
`ORDER BY`
{% /dt %}

{% dd %}
Sorts the result set of a query by one or more columns. Includes ASC, DESC for sorting order.
```sql
SELECT *
FROM sales
ORDER BY sale_date DESC 
```

{% /dd %}

{% dt %}
`HAVING`
{% /dt %}

{% dd %}
Filters records that meet a specified condition after grouping.
```sql
SELECT product_id, SUM(quantity)
FROM sales
GROUP BY product_id
HAVING SUM(quantity) &gt; 10 
```

{% /dd %}

{% dt %}
`IN`, `ON`, `OR`
{% /dt %}

{% dd %}
Used for specified conditions in queries. Available in `WHERE`, `JOIN` clauses.
```sql
SELECT *
FROM orders
WHERE order_status IN ('Shipped', 'Pending') 
```

{% /dd %}

{% dt %}
`USING`
{% /dt %}

{% dd %}
This clause is a shorthand for joins where the join columns have the same name in both tables. It takes a comma-separated list of those columns and creates a separate equality condition for each matching pair. For example, joining `T1` and `T2` with `USING (a, b)` is equivalent to `ON T1.a = T2.a AND T1.b = T2.b`.
```sql
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers
USING (customer_id) 
```

{% /dd %}

{% dt %}
`AS`
{% /dt %}

{% dd %}
Renames a column or table with an alias.
```sql
SELECT first_name AS name
FROM employees 
```

{% /dd %}

{% dt %}
Arithmetic Operations
{% /dt %}

{% dd %}
Performs basic calculations using operators like `+`, `-`, `*`, `/`.
```sql
SELECT price, tax, (price * tax) AS total_cost
FROM products 
```

{% /dd %}

{% dt %}
`INTERVAL value unit`
{% /dt %}

{% dd %}
Interval representing a time duration specified in a given unit. Supported units: - `milliseconds` / `millisecond` - `seconds` / `second` - `minutes` / `minute` - `hours` / `hour` - `days` / `day`
{% /dd %}

{% /dl %}

## Data types{% #data-types %}

DDSQL supports the following data types:

| Data Type   | Description                                                               |
| ----------- | ------------------------------------------------------------------------- |
| `BIGINT`    | 64-bit signed integers.                                                   |
| `BOOLEAN`   | `true` or `false` values.                                                 |
| `DECIMAL`   | Floating-point numbers.                                                   |
| `INET`      | Network address values (IPv4 and IPv6, with optional CIDR prefix length). |
| `INTERVAL`  | Time duration values.                                                     |
| `JSON`      | JSON data.                                                                |
| `TIMESTAMP` | Date and time values.                                                     |
| `VARCHAR`   | Variable-length character strings.                                        |

### Array types{% #array-types %}

All data types support array types. See Arrays for array literals, element access, and array functions.

## Type literals{% #type-literals %}

DDSQL supports explicit type literals using the syntax `[TYPE] [value]`.

| Type        | Syntax                  | Example                                |
| ----------- | ----------------------- | -------------------------------------- |
| `BIGINT`    | `BIGINT 'value'`        | `BIGINT '1234567'`                     |
| `BOOLEAN`   | `BOOLEAN 'value'`       | `BOOLEAN 'true'`                       |
| `DECIMAL`   | `DECIMAL 'value'`       | `DECIMAL '3.14159'`                    |
| `INET`      | `INET 'value'`          | `INET '192.168.1.5/24'`                |
| `INTERVAL`  | `INTERVAL 'value unit'` | `INTERVAL '30 minutes'`                |
| `JSON`      | `JSON 'value'`          | `JSON '{"key": "value", "count": 42}'` |
| `TIMESTAMP` | `TIMESTAMP 'value'`     | `TIMESTAMP '2023-12-25 10:30:00'`      |
| `VARCHAR`   | `VARCHAR 'value'`       | `VARCHAR 'hello world'`                |

The type prefix can be omitted and the type is automatically inferred from the value. For example, `'hello world'` is inferred as `VARCHAR`, `123` as `BIGINT`, and `true` as `BOOLEAN`. Use explicit type prefixes when values could be ambiguous; for example,`TIMESTAMP '2025-01-01'` would be inferred as `VARCHAR` without the prefix.

### Example{% #example %}

```sql
-- Using type literals in queries
SELECT
    VARCHAR 'Product Name: ' || name AS labeled_name,
    price * DECIMAL '1.08' AS price_with_tax,
    created_at + INTERVAL '7 days' AS expiry_date
FROM products
WHERE created_at &gt; TIMESTAMP '2025-01-01';
```

## Arrays{% #arrays %}

Arrays are ordered collections of values that all share the same data type. Every DDSQL base type has a corresponding array type.

### Array literals{% #array-literals %}

Use the `ARRAY[value1, value2, ...]` syntax to construct an array literal. The array's type is automatically inferred from the values.

```sql
SELECT ARRAY['apple', 'banana', 'cherry'] AS fruits;  -- VARCHAR array
SELECT ARRAY[1, 2, 3] AS numbers;                     -- BIGINT array
SELECT ARRAY[true, false, true] AS flags;             -- BOOLEAN array
SELECT ARRAY[1.1, 2.2, 3.3] AS decimals;              -- DECIMAL array
```

### Element access{% #element-access %}

Access individual array elements with a 1-based subscript. Accessing an index that is out of bounds returns `NULL`.

```sql
SELECT ARRAY['a', 'b', 'c'][1];   -- Returns 'a'
SELECT ARRAY['a', 'b', 'c'][2];   -- Returns 'b'
SELECT ARRAY['a', 'b', 'c'][10];  -- Returns NULL (out of bounds)
```

To access elements of an array column, use the same subscript syntax:

```sql
SELECT recipients[1] AS first_recipient
FROM emails
```

### Array functions{% #array-functions %}

The following functions operate on arrays:

| Function                                      | Return Type         | Description                                                                                        |
| --------------------------------------------- | ------------------- | -------------------------------------------------------------------------------------------------- |
| `CARDINALITY(array a)`                        | `BIGINT`            | Returns the number of elements in the array.                                                       |
| `ARRAY_POSITION(array a, typeof_array value)` | `BIGINT`            | Returns the 1-based index of the first occurrence of `value` in the array, or `NULL` if not found. |
| `STRING_TO_ARRAY(string s, string delimiter)` | `VARCHAR[]`         | Splits a string into an array of strings on the given delimiter.                                   |
| `ARRAY_TO_STRING(array a, string delimiter)`  | `VARCHAR`           | Joins array elements into a string with the given delimiter.                                       |
| `ARRAY_AGG(expression e)`                     | array of input type | Aggregates values from multiple rows into an array.                                                |
| `UNNEST(array a [, array b...])`              | rows of a [, b&acirc;&#128;&brvbar;]    | Expands one or more arrays into a set of rows. Only valid in a `FROM` clause.                      |

{% collapsible-section %}
### Examples

### `CARDINALITY`{% #cardinality %}

```sql
SELECT
  CARDINALITY(recipients) AS recipient_count
FROM
  emails
```

### `ARRAY_POSITION`{% #array_position %}

```sql
SELECT
  ARRAY_POSITION(recipients, 'hello@example.com') AS position
FROM
  emails
```

### `STRING_TO_ARRAY`{% #string_to_array %}

```sql
SELECT
  STRING_TO_ARRAY('a,b,c,d,e,f', ',') AS parts
```

### `ARRAY_TO_STRING`{% #array_to_string %}

```sql
SELECT
  ARRAY_TO_STRING(ARRAY['a', 'b', 'c'], ',') AS joined_string
```

### `ARRAY_AGG`{% #array_agg %}

```sql
SELECT
  sender,
  ARRAY_AGG(subject) AS subjects,
  ARRAY_AGG(DISTINCT subject) AS distinct_subjects
FROM
  emails
GROUP BY
  sender
```

### `UNNEST`{% #unnest %}

```sql
SELECT
  sender,
  recipient
FROM
  emails,
  UNNEST(recipients) AS recipient
```

{% /collapsible-section %}

## Functions{% #functions %}

The following SQL functions are supported. For Window function, see the separate Window function section in this documentation.

| Function                                                                    | Return Type                     | Description                                                                                                                                                                                                                                      |
| --------------------------------------------------------------------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `MIN(variable v)`                                                           | typeof v                        | Returns the smallest value in a set of data.                                                                                                                                                                                                     |
| `MAX(variable v)`                                                           | typeof v                        | Returns the maximum value across all input values.                                                                                                                                                                                               |
| `COUNT(any a)`                                                              | numeric                         | Returns the number of input values that are not null.                                                                                                                                                                                            |
| `SUM(numeric n)`                                                            | numeric                         | Returns the summation across all input values.                                                                                                                                                                                                   |
| `AVG(numeric n)`                                                            | numeric                         | Returns the average value (arithmetic mean) across all input values.                                                                                                                                                                             |
| `BOOL_AND(boolean b)`                                                       | boolean                         | Returns whether all non-null input values are true.                                                                                                                                                                                              |
| `BOOL_OR(boolean b)`                                                        | boolean                         | Returns whether any non-null input value is true.                                                                                                                                                                                                |
| `CEIL(numeric n)` / `CEILING(numeric n)`                                    | numeric                         | Returns the value rounded up to the nearest integer. Both `CEIL` and `CEILING` are supported as aliases.                                                                                                                                         |
| `FLOOR(numeric n)`                                                          | numeric                         | Returns the value rounded down to the nearest integer.                                                                                                                                                                                           |
| `ROUND(numeric n)`                                                          | numeric                         | Returns the value rounded to the nearest integer.                                                                                                                                                                                                |
| `POWER(numeric base, numeric exponent)`                                     | numeric                         | Returns the value of base raised to the power of exponent.                                                                                                                                                                                       |
| `LOWER(string s)`                                                           | string                          | Returns the string as lowercase.                                                                                                                                                                                                                 |
| `UPPER(string s)`                                                           | string                          | Returns the string as uppercase.                                                                                                                                                                                                                 |
| `ABS(numeric n)`                                                            | numeric                         | Returns the absolute value.                                                                                                                                                                                                                      |
| `COALESCE(args a)`                                                          | typeof first non-null a OR null | Returns the first non-null value or null if all are null.                                                                                                                                                                                        |
| `CAST(value AS type)`                                                       | type                            | Converts the given value to the specified data type.                                                                                                                                                                                             |
| `LENGTH(string s)`                                                          | integer                         | Returns the number of characters in the string.                                                                                                                                                                                                  |
| `TRIM(string s)`                                                            | string                          | Removes leading and trailing whitespace from the string.                                                                                                                                                                                         |
| `REPLACE(string s, string from, string to)`                                 | string                          | Replaces occurrences of a substring within a string with another substring.                                                                                                                                                                      |
| `SUBSTRING(string s, int start, int length)`                                | string                          | Extracts a substring from a string, starting at a given position and for a specified length.                                                                                                                                                     |
| `REVERSE(string s)`                                                         | string                          | Returns the string with characters in reverse order.                                                                                                                                                                                             |
| `STRPOS(string s, string substring)`                                        | integer                         | Returns the first index position of the substring in a given string, or 0 if there is no match.                                                                                                                                                  |
| `SPLIT_PART(string s, string delimiter, integer index)`                     | string                          | Splits the string on the given delimiter and returns the string at the given position counting from one.                                                                                                                                         |
| `EXTRACT(unit from timestamp/interval)`                                     | numeric                         | Extracts a part of a date or time field (such as year or month) from a timestamp or interval.                                                                                                                                                    |
| `TO_TIMESTAMP(string timestamp, string format)`                             | timestamp                       | Converts a string to a timestamp according to the given format.                                                                                                                                                                                  |
| `TO_TIMESTAMP(numeric epoch)`                                               | timestamp                       | Converts a UNIX epoch timestamp (in seconds) to a timestamp.                                                                                                                                                                                     |
| `TO_CHAR(timestamp t, string format)`                                       | string                          | Converts a timestamp to a string according to the given format.                                                                                                                                                                                  |
| `DATE_BIN(interval stride, timestamp source, timestamp origin)`             | timestamp                       | Aligns a timestamp (source) to buckets of even length (stride). Returns the start of the bucket containing the source, calculated as the largest timestamp that is less than or equal to source and is a multiple of stride lengths from origin. |
| `DATE_TRUNC(string unit, timestamp t)`                                      | timestamp                       | Truncates a timestamp to a specified precision based on the provided unit.                                                                                                                                                                       |
| `CURRENT_SETTING(string setting_name)`                                      | string                          | Returns the current value of the specified setting. Supports the parameters `dd.time_frame_start` and `dd.time_frame_end`, which return the start and end of the global time frame, respectively.                                                |
| `NOW()`                                                                     | timestamp                       | Returns the current UTC timestamp at the start of the current query.                                                                                                                                                                             |
| `CARDINALITY(array a)`                                                      | integer                         | Returns the number of elements in the array.                                                                                                                                                                                                     |
| `ARRAY_POSITION(array a, typeof_array value)`                               | integer                         | Returns the index of the first occurrence of the value found in the array, or null if value is not found.                                                                                                                                        |
| `STRING_TO_ARRAY(string s, string delimiter)`                               | array of strings                | Splits the given string into an array of strings using the given delimiter.                                                                                                                                                                      |
| `ARRAY_TO_STRING(array a, string delimiter)`                                | string                          | Converts an array to a string by concatenating elements with the given delimiter.                                                                                                                                                                |
| `ARRAY_AGG(expression e)`                                                   | array of input type             | Creates an array by collecting all the input values.                                                                                                                                                                                             |
| `APPROX_PERCENTILE(double percentile) WITHIN GROUP (ORDER BY expression e)` | typeof expression               | Computes an approximate percentile value. The percentile must be between 0.0 and 1.0 (inclusive). Requires the `WITHIN GROUP (ORDER BY ...)` syntax.                                                                                             |
| `UNNEST(array a [, array b...])`                                            | rows of a [, b&acirc;&#128;&brvbar;]                | Expands arrays into a set of rows. This form is only allowed in a FROM clause.                                                                                                                                                                   |

{% collapsible-section %}
### Examples

### `MIN`{% #min %}

```sql
SELECT MIN(response_time) AS min_response_time
FROM logs
WHERE status_code = 200
```

### `MAX`{% #max %}

```sql
SELECT MAX(response_time) AS max_response_time
FROM logs
WHERE status_code = 200
```

### `COUNT`{% #count %}

```sql
SELECT COUNT(request_id) AS total_requests
FROM logs
WHERE status_code = 200 
```

### `SUM`{% #sum %}

```sql
SELECT SUM(bytes_transferred) AS total_bytes
FROM logs
GROUP BY service_name
```

### `AVG`{% #avg %}

```sql
SELECT AVG(response_time)
AS avg_response_time
FROM logs
WHERE status_code = 200
GROUP BY service_name
```

### `BOOL_AND`{% #bool_and %}

```sql
SELECT BOOL_AND(status_code = 200) AS all_success
FROM logs
```

### `BOOL_OR`{% #bool_or %}

```sql
SELECT BOOL_OR(status_code = 200) AS some_success
FROM logs
```

### `CEIL`{% #ceil %}

```sql
SELECT CEIL(price) AS rounded_price
FROM products
```

### `FLOOR`{% #floor %}

```sql
SELECT FLOOR(price) AS floored_price
FROM products
```

### `ROUND`{% #round %}

```sql
SELECT ROUND(price) AS rounded_price
FROM products
```

### `POWER`{% #power %}

```sql
SELECT POWER(response_time, 2) AS squared_response_time
FROM logs
```

### `LOWER`{% #lower %}

```sql
SELECT LOWER(customer_name) AS lowercase_name
FROM customers
```

### `UPPER`{% #upper %}

```sql
SELECT UPPER(customer_name) AS uppercase_name
FROM customers
```

### `ABS`{% #abs %}

```sql
SELECT ABS(balance) AS absolute_balance
FROM accounts
```

### `COALESCE`{% #coalesce %}

```sql
SELECT COALESCE(phone_number, email) AS contact_info
FROM users
```

### `CAST`{% #cast %}

Supported cast target types:

- `BIGINT`
- `DECIMAL`
- `INET`
- `TIMESTAMP`
- `VARCHAR`

```sql
SELECT
  CAST(order_id AS VARCHAR) AS order_id_string,
  'Order-' || CAST(order_id AS VARCHAR) AS order_label
FROM
  orders
```

### `LENGTH`{% #length %}

```sql
SELECT
  customer_name,
  LENGTH(customer_name) AS name_length
FROM
  customers
```

### `INTERVAL`{% #interval %}

```sql
SELECT
  TIMESTAMP '2023-10-01 10:00:00' + INTERVAL '30 days' AS future_date,
  INTERVAL '1 MILLISECOND 2 SECONDS 3 MINUTES 4 HOURS 5 DAYS'
```

### `TRIM`{% #trim %}

```sql
SELECT
  TRIM(name) AS trimmed_name
FROM
  users
```

### `REPLACE`{% #replace %}

```sql
SELECT
  REPLACE(description, 'old', 'new') AS updated_description
FROM
  products
```

### `SUBSTRING`{% #substring %}

```sql
SELECT
  SUBSTRING(title, 1, 10) AS short_title
FROM
  books
```

### `REVERSE`{% #reverse %}

```sql
SELECT
  REVERSE(username) AS reversed_username
FROM
  users
LIMIT 5
```

### `STRPOS`{% #strpos %}

```sql
SELECT
  STRPOS('foobar', 'bar')
```

### `SPLIT_PART`{% #split_part %}

```sql
SELECT
  SPLIT_PART('aaa-bbb-ccc', '-', 2)
```

### `EXTRACT`{% #extract %}

Supported extraction units:

| Literal           | Input Type               | Description                                                                                        |
| ----------------- | ------------------------ | -------------------------------------------------------------------------------------------------- |
| `day`             | `timestamp` / `interval` | day of the month                                                                                   |
| `dow`             | `timestamp`              | day of the week `1` (Monday) to `7` (Sunday)                                                       |
| `doy`             | `timestamp`              | day of the year (`1` - `366`)                                                                      |
| `epoch`           | `timestamp` / `interval` | seconds since 1970-01-01 00:00:00 UTC (for timestamps), or total number of seconds (for intervals) |
| `hour`            | `timestamp` / `interval` | hour of the day (`0` - `23`)                                                                       |
| `minute`          | `timestamp` / `interval` | minute of the hour (`0` - `59`)                                                                    |
| `second`          | `timestamp` / `interval` | second of the minute (`0` - `59`)                                                                  |
| `week`            | `timestamp`              | week of the year (`1` - `53`)                                                                      |
| `month`           | `timestamp`              | month of the year (`1` - `12`)                                                                     |
| `quarter`         | `timestamp`              | quarter of the year (`1` - `4`)                                                                    |
| `year`            | `timestamp`              | year                                                                                               |
| `timezone_hour`   | `timestamp`              | hour of the time zone offset                                                                       |
| `timezone_minute` | `timestamp`              | minute of the time zone offset                                                                     |

```sql
SELECT
  EXTRACT(year FROM purchase_date) AS purchase_year
FROM
  sales
```

```sql
-- Get the Unix epoch of a timestamp
SELECT EXTRACT(epoch FROM TIMESTAMP '2021-01-01 00:00:00+00')
-- Returns: 1609459200
```

```sql
-- Get the total seconds in an interval
SELECT EXTRACT(epoch FROM INTERVAL '1 day 2 hours')
-- Returns: 93600
```

```sql
-- Calculate how many seconds ago each event occurred
SELECT
  event_time,
  EXTRACT(epoch FROM now()) - EXTRACT(epoch FROM event_time) AS seconds_ago
FROM
  events
```

### `TO_TIMESTAMP`{% #to_timestamp %}

`TO_TIMESTAMP` has two forms:

**Form 1: Convert string to timestamp with format**

Supported patterns for date/time formatting:

| Pattern     | Description                          |
| ----------- | ------------------------------------ |
| `YYYY`      | year (4 digits)                      |
| `YY`        | year (2 digits)                      |
| `MM`        | month number (01 - 12)               |
| `DD`        | day of month (01 - 31)               |
| `HH24`      | hour of day (00 - 23)                |
| `HH12`      | hour of day (01 - 12)                |
| `HH`        | hour of day (01 - 12)                |
| `MI`        | minute (00 - 59)                     |
| `SS`        | second (00 - 59)                     |
| `MS`        | millisecond (000 - 999)              |
| `TZ`        | time-zone abbreviation               |
| `OF`        | time-zone offset from UTC            |
| `AM` / `am` | meridiem indicator (without periods) |
| `PM` / `pm` | meridiem indicator (without periods) |

```sql
SELECT
  TO_TIMESTAMP('25/12/2025 04:23 pm', 'DD/MM/YYYY HH:MI am') AS ts
```

**Form 2: Convert UNIX epoch timestamp to timestamp**

```sql
SELECT
  TO_TIMESTAMP(1735142580) AS ts_from_epoch
```

### `TO_CHAR`{% #to_char %}

Supported patterns for date/time formatting:

| Pattern     | Description                          |
| ----------- | ------------------------------------ |
| `YYYY`      | year (4 digits)                      |
| `YY`        | year (2 digits)                      |
| `MM`        | month number (01 - 12)               |
| `DD`        | day of month (01 - 31)               |
| `HH24`      | hour of day (00 - 23)                |
| `HH12`      | hour of day (01 - 12)                |
| `HH`        | hour of day (01 - 12)                |
| `MI`        | minute (00 - 59)                     |
| `SS`        | second (00 - 59)                     |
| `MS`        | millisecond (000 - 999)              |
| `TZ`        | time-zone abbreviation               |
| `OF`        | time-zone offset from UTC            |
| `AM` / `am` | meridiem indicator (without periods) |
| `PM` / `pm` | meridiem indicator (without periods) |

```sql
SELECT
  TO_CHAR(order_date, 'MM-DD-YYYY') AS formatted_date
FROM
  orders
```

### `DATE_BIN`{% #date_bin %}

```sql
SELECT DATE_BIN('15 minutes', TIMESTAMP '2025-09-15 12:34:56', TIMESTAMP '2025-01-01')
-- Returns 2025-09-15 12:30:00

SELECT DATE_BIN('1 day', TIMESTAMP '2025-09-15 12:34:56', TIMESTAMP '2025-01-01')
-- Returns 2025-09-15 00:00:00
```

### `DATE_TRUNC`{% #date_trunc %}

Supported truncations:

- `milliseconds`
- `seconds` / `second`
- `minutes` / `minute`
- `hours` / `hour`
- `days` / `day`
- `weeks` / `week`
- `months` / `month`
- `quarters` / `quarter`
- `years` / `year`

```sql
SELECT
  DATE_TRUNC('month', event_time) AS month_start
FROM
  events
```

### `CURRENT_SETTING`{% #current_setting %}

Supported setting parameters:

- `dd.time_frame_start`: Returns the start of the selected time frame in RFC 3339 format (`YYYY-MM-DD HH:mm:ss.sss&Acirc;&plusmn;HH:mm`).
- `dd.time_frame_end`: Returns the end of the selected time frame in RFC 3339 format (`YYYY-MM-DD HH:mm:ss.sss&Acirc;&plusmn;HH:mm`).

```sql
-- Define the current analysis window
WITH bounds AS (
  SELECT CAST(CURRENT_SETTING('dd.time_frame_start') AS TIMESTAMP) AS time_frame_start,
         CAST(CURRENT_SETTING('dd.time_frame_end')   AS TIMESTAMP) AS time_frame_end
),
-- Define the immediately preceding window of equal length
     previous_bounds AS (
  SELECT time_frame_start - (time_frame_end - time_frame_start) AS prev_time_frame_start,
         time_frame_start                                       AS prev_time_frame_end
  FROM bounds
)
SELECT * FROM bounds, previous_bounds
```

### `NOW`{% #now %}

```sql
SELECT
  *
FROM
  sales
WHERE
  purchase_date &gt; NOW() - INTERVAL '1 hour'
```

### `APPROX_PERCENTILE`{% #approx_percentile %}

```sql
-- Calculate the median (50th percentile) response time
SELECT
  APPROX_PERCENTILE(0.5) WITHIN GROUP (ORDER BY response_time) AS median_response_time
FROM
  logs

-- Calculate 95th and 99th response time percentiles by service
SELECT
  service_name,
  APPROX_PERCENTILE(0.95) WITHIN GROUP (ORDER BY response_time) AS p95_response_time,
  APPROX_PERCENTILE(0.99) WITHIN GROUP (ORDER BY response_time) AS p99_response_time
FROM
  logs
GROUP BY
  service_name
```

{% /collapsible-section %}

## Regular expressions{% #regular-expressions %}

### Flavor{% #flavor %}

All regular expression (regex) functions use the International Components for Unicode (ICU) flavor:

- [Metacharacters](https://unicode-org.github.io/icu/userguide/strings/regexp.html#regular-expression-metacharacters)
- [Operators](https://unicode-org.github.io/icu/userguide/strings/regexp.html#regular-expression-operators)
- [Set Expressions (Character Classes)](https://unicode-org.github.io/icu/userguide/strings/regexp.html#set-expressions-character-classes)
- [Flag Options for in-pattern flags](https://unicode-org.github.io/icu/userguide/strings/regexp.html#flag-options). Refer to the flags section below for function-level flags.
- [Find and Replace (using capture groups)](https://unicode-org.github.io/icu/userguide/strings/regexp.html#find-and-replace)

### Functions{% #functions-1 %}

| Function                                                                                                         | Return Type      | Description                                                                                                                                                                                                                                                  |
| ---------------------------------------------------------------------------------------------------------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `REGEXP_LIKE(string input, string pattern)`                                                                      | Boolean          | Evaluates whether a string matches a regular expression pattern.                                                                                                                                                                                             |
| `REGEXP_MATCH(string input, string pattern [, string flags ])`                                                   | array of strings | Returns substrings of the first pattern match in the string.This function searches the input string using the given pattern and returns captured substrings (capture groups) from the first match. If no capture groups are present, returns the full match. |
| `REGEXP_REPLACE(string input, string pattern, string replacement [, string flags ])`                             | string           | Replaces the substring that is the first match to the pattern, or all such matches if you use the optional `g` flag.                                                                                                                                         |
| `REGEXP_REPLACE (string input, string pattern, string replacement, integer start, integer N [, string flags ] )` | string           | Replaces the substring that is the Nth match to the pattern, or all such matches if `N` is zero, starting from `start`.                                                                                                                                      |

{% collapsible-section %}
### Examples

### `REGEXP_LIKE`{% #regexp_like %}

```sql
SELECT
  *
FROM
  emails
WHERE
  REGEXP_LIKE(email_address, '@example\.com$')
```

### `REGEXP_MATCH`{% #regexp_match %}

```sql
SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
-- {bar,beque}

SELECT regexp_match('foobarbequebaz', 'barbeque');
-- {barbeque}

SELECT regexp_match('abc123xyz', '([a-z]+)(\d+)(x(.)z)');
-- {abc,123,xyz,y}
```

### `REGEXP_REPLACE`{% #regexp_replace %}

```sql
SELECT regexp_replace('Auth success token=abc123XYZ789', 'token=\w+', 'token=***');
-- Auth success token=***

SELECT regexp_replace('status=200 method=GET', 'status=(\d+) method=(\w+)', '$2: $1');
-- GET: 200

SELECT regexp_replace('INFO INFO INFO', 'INFO', 'DEBUG', 1, 2);
-- INFO DEBUG INFO
```

{% /collapsible-section %}

### Function-level flags{% #function-level-flags %}

You can use the following flags with regular expression functions:

{% dl %}

{% dt %}
`i`
{% /dt %}

{% dd %}
Case-insensitive matching
{% /dd %}

{% dt %}
`n` or `m`
{% /dt %}

{% dd %}
Newline-sensitive matching
{% /dd %}

{% dt %}
`g`
{% /dt %}

{% dd %}
Global; replace *all* matching substrings rather than only the first one
{% /dd %}

{% /dl %}

{% collapsible-section %}
### Examples

### `i` flag{% #i-flag %}

```sql
SELECT regexp_match('INFO', 'info')
-- NULL

SELECT regexp_match('INFO', 'info', 'i')
-- ['INFO']
```

### `n` flag{% #n-flag %}

```sql
SELECT regexp_match('a
b', '^b');
-- NULL

SELECT regexp_match('a
b', '^b', 'n');
-- ['b']
```

### `g` flag{% #g-flag %}

```sql
SELECT icu_regexp_replace('Request id=12345 completed, id=67890 pending', 'id=\d+', 'id=XXX');
-- Request id=XXX completed, id=67890 pending

SELECT regexp_replace('Request id=12345 completed, id=67890 pending', 'id=\d+', 'id=XXX', 'g');
-- Request id=XXX completed, id=XXX pending
```

{% /collapsible-section %}

## Window functions{% #window-functions %}

This table provides an overview of the supported window functions. For comprehensive details and examples, see the [PostgreSQL documentation](https://www.postgresql.org/docs/current/functions-window.html).

| Function                      | Return Type   | Description                                                                         |
| ----------------------------- | ------------- | ----------------------------------------------------------------------------------- |
| `OVER`                        | N/A           | Defines a window for a set of rows for other window functions to operate on.        |
| `PARTITION BY`                | N/A           | Divides the result set into partitions, specifically for applying window functions. |
| `RANK()`                      | integer       | Assigns a rank to each row within a partition, with gaps for ties.                  |
| `ROW_NUMBER()`                | integer       | Assigns a unique sequential number to each row within a partition.                  |
| `LEAD(column n)`              | typeof column | Returns the value from the next row in the partition.                               |
| `LAG(column n)`               | typeof column | Returns the value from the previous row in the partition.                           |
| `FIRST_VALUE(column n)`       | typeof column | Returns the first value in an ordered set of values.                                |
| `LAST_VALUE(column n)`        | typeof column | Returns the last value in an ordered set of values.                                 |
| `NTH_VALUE(column n, offset)` | typeof column | Returns the value at the specified offset in an ordered set of values.              |

## JSON functions and operators{% #json-functions-and-operators %}

| Name                                          | Return type  | Description                                                                                                                                                                                                                                                                                                                                                              |
| --------------------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| json_extract_path_text(text json, text path&acirc;&#128;&brvbar;) | text         | Extracts a JSON sub-object as text, defined by the path. Its behavior is equivalent to the [Postgres function with the same name](https://www.postgresql.org/docs/current/functions-json.html). For example, `json_extract_path_text(col, 'forest')` returns the value of the key `forest` for each JSON object in `col`. See the example below for a JSON array syntax. |
| json_extract_path(text json, text path&acirc;&#128;&brvbar;)      | JSON         | Same functionality as `json_extract_path_text`, but returns a column of JSON type instead of text type.                                                                                                                                                                                                                                                                  |
| json_array_elements(text json)                | rows of JSON | Expands a JSON array into a set of rows. This form is only allowed in a FROM clause.                                                                                                                                                                                                                                                                                     |
| json_array_elements_text(text json)           | rows of text | Expands a JSON array into a set of rows. This form is only allowed in a FROM clause.                                                                                                                                                                                                                                                                                     |

## Network address functions and operators{% #network-address-functions-and-operators %}

The `inet` type represents IPv4 and IPv6 network addresses with an optional CIDR prefix length (for example, `192.168.1.5/24` or `::1`). Create `inet` values with the type literal syntax `INET 'value'` or by casting a string with `CAST(column AS inet)`.

### Functions{% #functions-2 %}

| Function               | Return Type | Description                                                     |
| ---------------------- | ----------- | --------------------------------------------------------------- |
| `host(inet addr)`      | `VARCHAR`   | Returns the IP address as text, without the prefix length.      |
| `network(inet addr)`   | `INET`      | Returns the network part of the address, with host bits zeroed. |
| `netmask(inet addr)`   | `INET`      | Returns the network mask for the address.                       |
| `masklen(inet addr)`   | `BIGINT`    | Returns the prefix length of the network mask.                  |
| `broadcast(inet addr)` | `INET`      | Returns the broadcast address of the network.                   |
| `family(inet addr)`    | `BIGINT`    | Returns the address family: `4` for IPv4, `6` for IPv6.         |

### Operators{% #operators %}

| Operator            | Return Type | Description                                              |
| ------------------- | ----------- | -------------------------------------------------------- |
| `inet a &lt;&lt; inet b`  | `BOOLEAN`   | Returns `true` if `a` is strictly contained within `b`.  |
| `inet a &lt;&lt;= inet b` | `BOOLEAN`   | Returns `true` if `a` is contained within or equals `b`. |
| `inet a &gt;&gt; inet b`  | `BOOLEAN`   | Returns `true` if `a` strictly contains `b`.             |
| `inet a &gt;&gt;= inet b` | `BOOLEAN`   | Returns `true` if `a` contains or equals `b`.            |
| `inet a &amp;&amp; inet b`  | `BOOLEAN`   | Returns `true` if the subnets of `a` and `b` overlap.    |

{% collapsible-section %}
### Examples

### `host`{% #host %}

```sql
SELECT host(INET '192.168.1.5/24')
-- Returns: 192.168.1.5
```

### `network`{% #network %}

```sql
SELECT network(INET '192.168.1.5/24')
-- Returns: 192.168.1.0/24
```

### `netmask`{% #netmask %}

```sql
SELECT netmask(INET '192.168.1.5/24')
-- Returns: 255.255.255.0
```

### `masklen`{% #masklen %}

```sql
SELECT masklen(INET '192.168.1.5/24')
-- Returns: 24
```

### `broadcast`{% #broadcast %}

```sql
SELECT broadcast(INET '192.168.1.5/24')
-- Returns: 192.168.1.255/24
```

### `family`{% #family %}

```sql
SELECT family(INET '::1')
-- Returns: 6

SELECT family(INET '192.168.1.5')
-- Returns: 4
```

### Containment operators{% #containment-operators %}

```sql
-- Check if an IP is within a subnet
SELECT INET '192.168.1.5' &lt;&lt; INET '192.168.1.0/24'
-- Returns: true

-- Check containment or equality
SELECT INET '192.168.1.0/24' &lt;&lt;= INET '192.168.1.0/24'
-- Returns: true

-- Check if a subnet contains an IP
SELECT INET '10.0.0.0/8' &gt;&gt; INET '10.1.2.3'
-- Returns: true

-- Check if two subnets overlap
SELECT INET '192.168.1.0/24' &amp;&amp; INET '192.168.1.128/25'
-- Returns: true
```

### Combined usage{% #combined-usage %}

```sql
-- Find all IPs in a private subnet and extract network info
SELECT
  host(CAST(src_ip AS inet)) AS ip,
  masklen(CAST(src_ip AS inet)) AS prefix_len,
  network(CAST(src_ip AS inet)) AS network
FROM connections
WHERE CAST(src_ip AS inet) &lt;&lt; INET '10.0.0.0/8'
  AND family(CAST(src_ip AS inet)) = 4
```

{% /collapsible-section %}

## Table functions{% #table-functions %}

Table functions are used to query logs, metrics, cloud costs, and other data sources.

| Function                                                                                                                                                                                                                             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Example                                                                                                                                                                                                                                                                                                                               |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| dd.logs(
    columns =&gt; array &lt; varchar &gt;,
    filter ? =&gt; varchar,
    indexes ? =&gt; array &lt; varchar &gt;,
    storage ? =&gt; varchar,
    from_timestamp ? =&gt; timestamp,
    to_timestamp ? =&gt; timestamp
) AS (column_name type [, ...]) | Returns log data as a table. The columns parameter specifies which log fields to extract. Nested fields are accessed using dot notation, and non-core fields need to be prepended by `@`. The AS clause defines the schema of the returned table. Optional: filtering by index or time range. When time is not specified, DDSQL defaults to the global time setting, which in DDSQL Editor is set to the past 1 hour. Optional: specifying the storage to use (for example, `hot`, `flex_tier`). When not specified, the default is hot storage. | ```sql
  SELECT timestamp, host, service, message, asset_id
  FROM dd.logs(
      filter  =&gt; 'source:java',
      columns =&gt; ARRAY['timestamp','host','service','message','@asset.id']
  ) AS (
      timestamp TIMESTAMP,
      host      VARCHAR,
      service   VARCHAR,
      message   VARCHAR,
      asset_id  VARCHAR
  )
``` |
| dd.metrics_scalar(
    query varchar,
    reducer varchar [, from_timestamp timestamp, to_timestamp timestamp]
)                                                                                                                     | Returns metric data as a scalar value. The function accepts a metrics query (with optional grouping), a reducer to determine how values are aggregated (avg, max, etc.), and optional timestamp parameters (default 1 hour) to define the time range.                                                                                                                                                                                                                                                                                            | ```sql
  SELECT *
  FROM dd.metrics_scalar(
      'avg:system.cpu.user{*} by {service}',
      'avg',
      TIMESTAMP '2025-07-10 00:00:00.000-04:00',
      TIMESTAMP '2025-07-17 00:00:00.000-04:00'
  )
  ORDER BY value DESC;
```                                                                                                 |
| dd.metrics_timeseries(
    query varchar [, from_timestamp timestamp, to_timestamp timestamp]
)                                                                                                                                      | Returns metric data as a timeseries. The function accepts a metrics query (with optional grouping) and optional timestamp parameters (default 1 hour) to define the time range. Returns datapoints over time rather than a single aggregated value.                                                                                                                                                                                                                                                                                              | ```sql
  SELECT *
  FROM dd.metrics_timeseries(
      'avg:system.cpu.user{*} by {service}',
      TIMESTAMP '2025-07-10 00:00:00.000-04:00',
      TIMESTAMP '2025-07-17 00:00:00.000-04:00'
  )
  ORDER BY timestamp, service;
```                                                                                                  |
| dd.cloud_cost_scalar(
    query varchar,
    reducer varchar
    [, from_timestamp timestamp,
    to_timestamp timestamp]
)                                                                                                          | Returns [Cloud Cost Management](https://docs.datadoghq.com/cloud_cost_management.md) data as a scalar value. The function accepts a cloud cost query (with optional grouping), an aggregation reducer (use `sum` for cost data; other reducers such as `avg`, `min`, and `max` are accepted but rarely applicable to cost queries), and optional timestamp parameters (default 1 hour) to define the time range. **Note**: Cloud cost data is typically delayed by 24-48 hours, so recent timestamps may return no results.                      | ```sql
  SELECT *
  FROM dd.cloud_cost_scalar(
      'sum:all.cost{*} by {service}',
      'sum',
      TIMESTAMP '2025-07-10 00:00:00.000-04:00',
      TIMESTAMP '2025-07-17 00:00:00.000-04:00'
  )
  ORDER BY value DESC;
```                                                                                                     |
| dd.cloud_cost_timeseries(
    query varchar
    [, from_timestamp timestamp,
    to_timestamp timestamp]
)                                                                                                                           | Returns [Cloud Cost Management](https://docs.datadoghq.com/cloud_cost_management.md) data as a timeseries. The function accepts a cloud cost query (with optional grouping) and optional timestamp parameters (default 1 hour) to define the time range. Returns cost datapoints over time rather than a single aggregated value. **Note**: Cloud cost data is typically delayed by 24-48 hours, so recent timestamps may return no results.                                                                                                     | ```sql
  SELECT *
  FROM dd.cloud_cost_timeseries(
      'sum:all.cost{*} by {service}',
      TIMESTAMP '2025-07-10 00:00:00.000-04:00',
      TIMESTAMP '2025-07-17 00:00:00.000-04:00'
  )
  ORDER BY timestamp, service;
```                                                                                                      |

{% collapsible-section %}
### Examples

### Absolute timestamps{% #absolute-timestamps %}

```sql
SELECT *
FROM dd.logs(
    columns =&gt; ARRAY['timestamp','host','service','message'],
    from_timestamp =&gt; TIMESTAMP '2025-07-10 00:00:00.000-04:00',
    to_timestamp =&gt; TIMESTAMP '2025-07-17 00:00:00.000-04:00'
) AS (
    timestamp TIMESTAMP,
    host      VARCHAR,
    service   VARCHAR,
    message   VARCHAR
)
```

### Relative timestamps{% #relative-timestamps %}

```sql
SELECT *
FROM dd.logs(
    columns =&gt; ARRAY['timestamp','host','service','message'],
    from_timestamp =&gt; now() - INTERVAL '7 days',
    to_timestamp =&gt; now()
) AS (
    timestamp TIMESTAMP,
    host      VARCHAR,
    service   VARCHAR,
    message   VARCHAR
)
```

### Optional parameters{% #optional-parameters %}

```sql
SELECT *
FROM dd.logs(
    columns =&gt; ARRAY['timestamp','host','service','message'],
    filter  =&gt; 'source:java',
    indexes =&gt; ARRAY['trino'],
    storage =&gt; 'hot'
) AS (
    timestamp TIMESTAMP,
    host      VARCHAR,
    service   VARCHAR,
    message   VARCHAR
)
```

### Nested field access{% #nested-field-access %}

Column aliases cannot contain dots; replace them with underscores or any other valid character when defining the alias.

```sql
SELECT timestamp, host, asset_id, view_url, data_resource_type
FROM dd.logs(
    filter  =&gt; 'service:mcp',
    columns =&gt; ARRAY['timestamp','host','@asset.id','@view.url','@data.resource.type']
) AS (
    timestamp TIMESTAMP,
    host      VARCHAR,
    asset_id  VARCHAR,
    view_url  VARCHAR,
    data_resource_type VARCHAR
)
```

{% /collapsible-section %}

## Tags{% #tags %}

DDSQL exposes tags as an `hstore` type, which is inspired by PostgreSQL. You can access the values for specific tag keys using the PostgreSQL arrow operator. For example:

```sql
SELECT instance_type, count(instance_type)
FROM aws.ec2_instance
WHERE tags-&gt;'region' = 'us-east-1' -- region is a tag, not a column
GROUP BY instance_type
```

Tags are key-value pairs where each key can have zero, one, or multiple tag values corresponding to it. When accessed, the tag value returns a single string, containing *all* corresponding values. When the data has multiple tag values for the same tag key, they are represented as a sorted, comma-separated string. For example:

```sql
SELECT tags-&gt;'team', instance_type, architecture, COUNT(*) as instance_count
FROM aws.ec2_instance
WHERE tags-&gt;'team' = 'compute_provisioning,database_ops'
GROUP BY tags-&gt;'team', instance_type, architecture
ORDER BY instance_count DESC
```

You can also compare tag values as strings or entire tag sets:

```sql
SELECT *
FROM k8s.daemonsets da INNER JOIN k8s.deployments de
ON da.tags = de.tags -- for a specific tag: da.tags-&gt;'app' = de.tags-&gt;'app'
```

Additionally, you can extract tag keys and values into individual arrays of text:

```sql
SELECT akeys(tags), avals(tags)
FROM aws.ec2_instance
```

### HSTORE functions and operators{% #hstore-functions-and-operators %}

| Name               | Return type   | Description                                                           |
| ------------------ | ------------- | --------------------------------------------------------------------- |
| tags -&gt; 'text'     | Text          | Gets the value for a given key. Returns `null` if key is not present. |
| akeys(hstore tags) | Array of text | Gets the keys of an HSTORE as an array                                |
| avals(hstore tags) | Array of text | Gets the values of an HSTORE as an array                              |

## Writing efficient queries{% #writing-efficient-queries %}

Queries that read large amounts of data or run heavy computations can be slow or return resource errors. The patterns below are the most common causes, each with a rewrite that usually resolves the issue.

**Filter early and aggregate early**. Computations in the query filter or a `GROUP BY` summary run against the index. Computations in a `JOIN` or a wide `LIMIT` are held in memory.

### Checklist{% #checklist %}

Before re-running a slow query, ask the following questions:

- Does the query filter include a selective token (not a wildcard like `service:*` or `env:*`)?
- Is the time range as small as it can be for the question?
- Are you selecting only the columns you actually use?
- If you have a large `LIMIT`, could a `GROUP BY` summary work instead?
- If you have many `JOIN` clauses, could the query be rewritten as a single scan?
- When joining across sources, is each side filtered?
- Is the `JOIN` key high-cardinality (user/ request/trace ID)?
- Is the same regex being run more than once per row?

### Filtering{% #filtering %}

#### Filter on the data source{% #filter-on-the-data-source %}

Always include a selective token in your data source query, such as `service:`, `host:`, `env:`, or any `@attribute:value`.

Wildcard filters such as `service:*` or `env:*` match every event, so they don't actually narrow the data. Treat these filters as equivalent to leaving the filter blank.

**Before**

```sql
-- No filter; scans all logs in the time range
SELECT timestamp, service, host, message
FROM dd.logs(
    columns        =&gt; ARRAY['timestamp', 'service', 'host', 'message'],
    from_timestamp =&gt; NOW() - INTERVAL '7 days',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp TIMESTAMP,
    service   VARCHAR,
    host      VARCHAR,
    message   VARCHAR
)
```

**After**

```sql
-- filter =&gt; narrows the scan to matching events before any SQL runs
SELECT timestamp, service, host, message
FROM dd.logs(
    filter         =&gt; 'service:checkout-api env:prod',
    columns        =&gt; ARRAY['timestamp', 'service', 'host', 'message'],
    from_timestamp =&gt; NOW() - INTERVAL '7 days',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp TIMESTAMP,
    service   VARCHAR,
    host      VARCHAR,
    message   VARCHAR
)
```

#### Choose a time range that fits the question{% #choose-a-time-range-that-fits-the-question %}

Scan time grows with the time range. Consider starting with a window that covers the question (often a few hours or a day) and widening only when the question demands more. For long-term trends, a pre-aggregated metric or a daily-summary query runs faster than re-scanning raw events on every run.

*Goal: understand log volume from the billing service over the last month.*

**Before**

```sql
-- Scans 31 days of raw events
SELECT timestamp, message
FROM dd.logs(
    filter         =&gt; 'service:billing',
    columns        =&gt; ARRAY['timestamp', 'message'],
    from_timestamp =&gt; NOW() - INTERVAL '31 days',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp TIMESTAMP,
    message   VARCHAR
)
```

**After**

```sql
-- Aggregate up front for trends; only fetch the timestamp column needed for grouping
SELECT date_trunc('day', timestamp) AS day, count(*) AS events
FROM dd.logs(
    filter         =&gt; 'service:billing',
    columns        =&gt; ARRAY['timestamp'],
    from_timestamp =&gt; NOW() - INTERVAL '7 days',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp TIMESTAMP
)
GROUP BY 1
ORDER BY 1;
```

### Column selection{% #column-selection %}

#### Project only the columns you use{% #project-only-the-columns-you-use %}

Each column in your query is fetched from storage. Trim the column list to what you use downstream. Wide attributes like the raw `message` or full HTTP headers can slow down the query significantly.

**Before**

```sql
-- columns array fetches every field, including the expensive raw message
SELECT *
FROM dd.logs(
    filter         =&gt; 'service:checkout-api',
    columns        =&gt; ARRAY['timestamp', 'service', 'host', 'message', '@http.url', '@http.status_code'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp   TIMESTAMP,
    service     VARCHAR,
    host        VARCHAR,
    message     VARCHAR,
    http_url    VARCHAR,
    status_code VARCHAR
)
```

**After**

```sql
-- Only declare the columns the analysis actually uses
SELECT timestamp, service, host
FROM dd.logs(
    filter         =&gt; 'service:checkout-api',
    columns        =&gt; ARRAY['timestamp', 'service', 'host'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp TIMESTAMP,
    service   VARCHAR,
    host      VARCHAR
)
```

### Aggregations{% #aggregations %}

#### Return summaries, not raw rows{% #return-summaries-not-raw-rows %}

When your goal is to understand the data (such as top-N, counts per category, or distributions), a `GROUP BY` returns a focused result that is faster to compute and easier to work with than scanning millions of raw rows.

**Before**

```sql
SELECT *
FROM dd.logs(
    filter         =&gt; 'service:orders-api',
    columns        =&gt; ARRAY['timestamp', 'service', '@http.status_code', 'message'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (
    timestamp   TIMESTAMP,
    service     VARCHAR,
    status_code VARCHAR,
    message     VARCHAR
)
LIMIT 5000000;
```

**After**

```sql
-- Returns ~10 rows (one per status_code), answers the actual question
SELECT status_code, count(*) AS hits
FROM dd.logs(
    filter         =&gt; 'service:orders-api',
    columns        =&gt; ARRAY['@http.status_code'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (
    status_code VARCHAR
)
GROUP BY status_code
ORDER BY hits DESC
LIMIT 100;
```

**Note:** Aggregating in SQL with `GROUP BY` is more efficient than fetching raw rows and aggregating in a downstream step. The engine filters and summarizes data at the source.

#### Narrow the scan before aggregating on a high-cardinality column{% #narrow-the-scan-before-aggregating-on-a-high-cardinality-column %}

When working with high-cardinality columns (such as emails, IPs, or request IDs):

- Use `SELECT DISTINCT` or `GROUP BY` to return one entry per distinct value across workers. Without a tight filter, the result set grows without bound.
- Narrow the data source filter first so the aggregation runs over fewer rows.
- For wide cardinalities, pre-aggregate to one row per time bucket and key, then count distinct across buckets.

*Goal: find the distinct user emails in the checkout service's logs over the last 7 days.*

**Before**

```sql
-- No service filter; unbounded distinct set across all logs
SELECT DISTINCT user_email
FROM dd.logs(
    columns        =&gt; ARRAY['@usr.email'],
    from_timestamp =&gt; NOW() - INTERVAL '7 days',
    to_timestamp   =&gt; NOW()
) AS (
    user_email VARCHAR
)
```

**After**

```sql
-- Pre-aggregate to one row per (day, email), then count distinct across days
SELECT count(DISTINCT user_email) AS distinct_emails
FROM (
  SELECT date_trunc('day', timestamp) AS day, user_email
  FROM dd.logs(
      filter         =&gt; 'service:checkout-api',
      columns        =&gt; ARRAY['timestamp', '@usr.email'],
      from_timestamp =&gt; NOW() - INTERVAL '7 days',
      to_timestamp   =&gt; NOW()
  ) AS (
      timestamp  TIMESTAMP,
      user_email VARCHAR
  )
  GROUP BY 1, 2
) daily;
```

### Joins{% #joins %}

#### Combine self-joins into a single scan{% #combine-self-joins-into-a-single-scan %}

Joining one source to itself many times to correlate different events is one of the most common causes of slow queries. Most self-joins can be rewritten as a single scan with `CASE` expressions, window functions, or `GROUP BY ... HAVING`.

**Before: 4 self-joins**

```sql
WITH a AS (
    SELECT user_id FROM dd.logs(filter =&gt; 'service:checkout-api',
        columns =&gt; ARRAY['@usr.id'], from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp =&gt; NOW()) AS (user_id VARCHAR)
),
b AS (
    SELECT user_id FROM dd.logs(filter =&gt; 'service:payment-api',
        columns =&gt; ARRAY['@usr.id'], from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp =&gt; NOW()) AS (user_id VARCHAR)
),
c AS (
    SELECT user_id FROM dd.logs(filter =&gt; 'service:shipping-api',
        columns =&gt; ARRAY['@usr.id'], from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp =&gt; NOW()) AS (user_id VARCHAR)
),
d AS (
    SELECT user_id FROM dd.logs(filter =&gt; 'service:orders-api',
        columns =&gt; ARRAY['@usr.id'], from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp =&gt; NOW()) AS (user_id VARCHAR)
)
SELECT user_id
FROM a JOIN b USING (user_id)
       JOIN c USING (user_id)
       JOIN d USING (user_id);
```

**After: single scan**

```sql
SELECT user_id
FROM dd.logs(
    filter         =&gt; 'service:(checkout-api OR payment-api OR shipping-api OR orders-api)',
    columns        =&gt; ARRAY['service', '@usr.id'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (
    service VARCHAR,
    user_id VARCHAR
)
GROUP BY user_id
HAVING count(DISTINCT service) = 4;
```

#### Use a high-cardinality join key{% #use-a-high-cardinality-join-key %}

When joining, prefer a high-cardinality key like `user_id`, `request_id`, or `trace_id`. Coarse keys like `service` or `status` can expand a moderate input into billions of intermediate rows. For large datasets, also narrow the data source filter and time range on each side of the join.

#### Filter both sides of a cross-source join{% #filter-both-sides-of-a-cross-source-join %}

When you `JOIN` across two data sources, apply a selective filter on each side. Examples include logs + RUM, logs + traces, and feed + logs. An unfiltered side becomes a full scan that has to be held in memory for the join. Where possible, pre-aggregate each source separately and join the summaries.

**Before**

```sql
-- No filter on the logs side; full scan held in memory for the join
SELECT logs.message, rum.user_id
FROM (
    SELECT message, trace_id
    FROM dd.logs(
        columns        =&gt; ARRAY['message', 'trace_id'],
        from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp   =&gt; NOW()
    ) AS (message VARCHAR, trace_id VARCHAR)
) logs
JOIN rum ON logs.trace_id = rum.trace_id
WHERE rum.view_name = 'cart';
```

**After**

```sql
-- Both sides filtered before the join
SELECT logs.message, rum.user_id
FROM (
    SELECT message, trace_id
    FROM dd.logs(
        filter         =&gt; 'service:checkout-api status:error',
        columns        =&gt; ARRAY['message', 'trace_id'],
        from_timestamp =&gt; NOW() - INTERVAL '1 day',
        to_timestamp   =&gt; NOW()
    ) AS (message VARCHAR, trace_id VARCHAR)
) logs
JOIN rum ON logs.trace_id = rum.trace_id
WHERE rum.view_name = 'cart';
```

### Expressions{% #expressions %}

#### Run regular expressions once per row{% #run-regular-expressions-once-per-row %}

If you call `REGEXP_MATCH` once for each output column, the same pattern is evaluated against `message` repeatedly for every row. Run it once into an array, join the captures into a single delimited string, and unpack them with `SPLIT_PART` in a downstream `SELECT`.

**Before**

```sql
-- Same regex evaluated 3 times per row
SELECT
  SPLIT_PART(ARRAY_TO_STRING(REGEXP_MATCH(message, 'user_id=(\S+) latency_ms=(\d+) error=(\S+)'), '|||'), '|||', 1) AS user_id,
  SPLIT_PART(ARRAY_TO_STRING(REGEXP_MATCH(message, 'user_id=(\S+) latency_ms=(\d+) error=(\S+)'), '|||'), '|||', 2) AS latency_ms,
  SPLIT_PART(ARRAY_TO_STRING(REGEXP_MATCH(message, 'user_id=(\S+) latency_ms=(\d+) error=(\S+)'), '|||'), '|||', 3) AS error_code
FROM dd.logs(
    columns        =&gt; ARRAY['message'],
    from_timestamp =&gt; NOW() - INTERVAL '1 day',
    to_timestamp   =&gt; NOW()
) AS (message VARCHAR);
```

**After**

```sql
-- Regex runs once per row; captures unpacked in the outer SELECT
SELECT
  SPLIT_PART(matched, '|||', 1) AS user_id,
  SPLIT_PART(matched, '|||', 2) AS latency_ms,
  SPLIT_PART(matched, '|||', 3) AS error_code
FROM (
  SELECT ARRAY_TO_STRING(REGEXP_MATCH(message, 'user_id=(\S+) latency_ms=(\d+) error=(\S+)'), '|||') AS matched
  FROM dd.logs(
      columns        =&gt; ARRAY['message'],
      from_timestamp =&gt; NOW() - INTERVAL '1 day',
      to_timestamp   =&gt; NOW()
  ) AS (message VARCHAR)
) sub;
```

## Further reading{% #further-reading %}

- [Datadog MCP Server](https://docs.datadoghq.com/mcp_server.md)
- [DDSQL Editor](https://docs.datadoghq.com/ddsql_editor.md)
- [Getting Started with DDSQL Editor](https://learn.datadoghq.com/courses/getting-started-ddsql-editor)
</p>