Grouping Constructs

LAST UPDATED: DECEMBER 24, 2025

PREREQUISITES

Working knowledge of literal matching, character classes, anchors, and quantifiers.

A group, identified by pattern elements enclosed within (), function as a logical unit. This unit can control which alternatives are matched, be quantified to match repeating multi-character sub-patterns, participate in nested structures for large-pattern composition, and be captured for backreferencing. Multiple groups can exist in a pattern.

Alternations

The | character is used as a logical OR to match one of multiple alternative patterns. Because alternation has lower precedence than concatenation, parentheses explicitly define its scope and prevent unintended matches.

Examples - Alternation

EXAMPLE 1

Input:

CODE

{% set url1='https://docs.d3security.com' %}
{% set url2='docs.d3security.com' %}

{{ url1 | regex_match("https|http") }}
{{ url2 | regex_match("https|http") }}

Return Data:

CODE

True
False

EXAMPLE 2

Input:

CODE

{{ '2026-12-31' | regex_match('2026-(01|02|03)-31') }}
{{ '2026-12-31' | regex_match('2026-01|02|03-31') }}

Return Data:

CODE

False
True

Explanation: The second pattern is split into 2026-01, 02, and 03-31. The 02 alternative succeeds by matching the 02 sequence within the 2026 portion of the input string.

EXAMPLE 3

Input:

CODE

{% set ip1='192.168.200.30' %}
{% set ip2='192.168.200.254' %}

{{ ip1 | regex_match('\.(30|40|50)$') }}
{{ ip2 | regex_match('\.(30|40|50)$') }}

Return Data:

CODE

True
False

EXAMPLE 4

Input:

CODE

{% set ip1='10.10.10.30' %}
{% set ip2='10.88.10.30' %}
{% set ip3='10.10.10.99' %}
{% set ip4='10.10.88.99' %}

{% set pattern='(10\.){3}(3[0-9]|[4-9][0-9])' %}

{{ ip1 | regex_match(pattern) }}
{{ ip2 | regex_match(pattern) }}
{{ ip3 | regex_match(pattern) }}
{{ ip4 | regex_match(pattern) }}

Return Data:

CODE

True
False
True
False

EXAMPLE 5

Input:

CODE

{% set file1='demo.dll' %}
{% set file2='demo.pdf' %}
{% set file3='ubuntu-24.04.3-desktop-amd64.iso' %}

{% set sentence1='The ' ~ file1 ~ ' file was found.' %}
{% set sentence2='The ' ~ file2 ~ ' file was found.' %}
{% set sentence3='The ' ~ file3 ~ ' file was found.' %}

{{ sentence1 | regex_match('\.(dll|exe|iso|sh)') }}
{{ sentence2 | regex_match('\.(dll|exe|iso|sh)') }}
{{ sentence3 | regex_match('\.(dll|exe|iso|sh)') }}

~ is the string concatenation operator.

Return Data:

CODE

True
False
True

Nested Groups

Nested groups allow a group to contain another group, in the format (outer(inner)outer). They are required when a smaller pattern must be repeated or combined as part of a larger pattern.

Examples - Nested Groups

EXAMPLE 1

Input:

CODE

{% set value1='word1-word2-word3' %}
{% set value2='word1-word2' %}

{% set pattern='((\w+)-){2}\w+' %}

{{ value1 | regex_match(pattern) }}
{{ value2 | regex_match(pattern) }}

Reminder: \w is equivalent to the [A-Za-z0-9_] character class.

Return Data:

CODE

True
False

EXAMPLE 2

Input:

CODE

{% set version1='v1.2.3.4' %}
{% set version2='v1.2.3' %}
{% set version3='v1.2' %}

{% set pattern='^v((\d+)\.){2}\d+$' %}

{{ version1 | regex_match(pattern) }}
{{ version2 | regex_match(pattern) }}
{{ version3 | regex_match(pattern) }}

Reminder: \d is equivalent to the [0-9] character class.

Return Data:

CODE

False
True
False

EXAMPLE 3 CHALLENGING

Input:

CODE

{% set potential_ip1='192.168.200.30' %}
{% set potential_ip2='255.255.255.255' %}
{% set potential_ip3='0.0.0.0' %}
{% set potential_ip4='256.10.10.10' %}
{% set potential_ip5='192.168.010.1' %}
{% set potential_ip6='01.1.1.1' %}
{% set potential_ip7='10.10.10.00' %}

{% set octet='(25[0-5]|2[0-4]\d|1\d{2}|[1-9]\d|[1-9]|0)' %}
{% set pattern='^(' ~ octet ~ '\.){3}' ~ octet ~ '$' %}

{{ potential_ip1 | regex_match(pattern) }}
{{ potential_ip2 | regex_match(pattern) }}
{{ potential_ip3 | regex_match(pattern) }}
{{ potential_ip4 | regex_match(pattern) }}
{{ potential_ip5 | regex_match(pattern) }}
{{ potential_ip6 | regex_match(pattern) }}
{{ potential_ip7 | regex_match(pattern) }}

Explanation: The pattern matches strings containing exactly four dot-separated integers from 0 to 255. Leading zeros are passively rejected. ~ is the string concatenation operator.

Return Data:

CODE

True
True
True
False
False
False
False

EXAMPLE 4 CHALLENGING

Input:

CODE

{% set potential_mac1='AA:BB:CC:DD:EE:FF' %}
{% set potential_mac2='AA-BB-CC-DD-EE-FF' %}
{% set potential_mac3='AA BB CC DD EE FF' %}
{% set potential_mac4='AABBCCDDEEFF' %}
{% set potential_mac5='aabb.ccdd.eeff' %}
{% set potential_mac6='AA:BB-CC:DD-EE:FF' %}
{% set potential_mac7='AA:BB:CC:DD:EE' %}
{% set potential_mac8='AA:BB:CC:DD:EE:GG' %}

{% set hex='[0-9A-Fa-f]{2}' %}
{% set hex4='[0-9A-Fa-f]{4}' %}

{% set colon='(' ~ hex ~ ':){5}' ~ hex %}
{% set dash='(' ~ hex ~ '-){5}' ~ hex %}
{% set space='(' ~ hex ~ ' ){5}' ~ hex %}
{% set plain='[0-9A-Fa-f]{12}' %}
{% set cisco='(' ~ hex4 ~ '\.){2}' ~ hex4 %}

{% set mac_formats=colon ~ '|' ~ dash ~ '|' ~ space ~ '|' ~ plain ~ '|' ~ cisco %}
{% set pattern='^(' ~ mac_formats ~ ')$' %}

{{ potential_mac1 | regex_match(pattern) }}
{{ potential_mac2 | regex_match(pattern) }}
{{ potential_mac3 | regex_match(pattern) }}
{{ potential_mac4 | regex_match(pattern) }}
{{ potential_mac5 | regex_match(pattern) }}
{{ potential_mac6 | regex_match(pattern) }}
{{ potential_mac7 | regex_match(pattern) }}
{{ potential_mac8 | regex_match(pattern) }}

Explanation: Lines 13–17 define regex sub-patterns that represent different valid MAC address formats. These sub-patterns are combined via alternation to form the final validation pattern.

Return Data:

CODE

True
True
True
True
True
False
False
False

Backreferences

A backreference, expressed by \\#, where # is the sequential number of a group counted from left to right starting at 1, requires a subsequent portion of the input text to match the literal text previously captured by that group.

Unlike normal grouping, which continues to match based on the pattern itself, a backreference forces reuse of the stored value and rejects other otherwise valid matches.

Examples - Backreferences

EXAMPLE 1

Input:

CODE

{% set text1='D3 D3' %}
{% set text2='D3 Security' %}

{% set pattern1='(\w+) \w+' %}
{% set pattern2='(\w+) \\1' %}

{{ text1 | regex_match(pattern1) }}
{{ text1 | regex_match(pattern2) }}
{{ text2 | regex_match(pattern1) }}
{{ text2 | regex_match(pattern2) }}

Reminder: \w is equivalent to the [A-Za-z0-9_] character class.

Return Data:

CODE

True
True
True
False

EXAMPLE 2

Input:

CODE

{% set html1='<script>console.log("demo");</script>' %}
{% set html2='<div>lorem ipsum</div>' %}
{% set html3='<p>lorem ipsum</a>' %}

{% set pattern='<([A-Za-z]+)>.*</\\1>' %}

{{ html1 | regex_match(pattern) }}
{{ html2 | regex_match(pattern) }}
{{ html3 | regex_match(pattern) }}

Return Data:

CODE

True
True
False

EXAMPLE 3

Input:

CODE

{{ 'low high' | regex_match('(high|low) \\1') }}
{{ 'low low' | regex_match('(high|low) \\1') }}

Return Data:

CODE

False
True

EXAMPLE 4

Input:

CODE

{% set date1='2026-01-31' %}
{% set date2='2026/01/31' %}
{% set date3='2026-01/31' %}

{% set pattern='(\d{4})(-|/)\d{2}\\2\d{2}' %}

{{ date1 | regex_match(pattern) }}
{{ date2 | regex_match(pattern) }}
{{ date3 | regex_match(pattern) }}

Return Data:

CODE

True
True
False

EXAMPLE 5

Input:

CODE

{% set log1='SRC=192.168.200.99 DST=192.168.200.99' %}
{% set log2='SRC=192.168.200.100 DST=192.168.200.100' %}
{% set log3='SRC=192.168.200.200 DST=192.168.200.200' %}
{% set log4='SRC=192.168.200.201 DST=192.168.200.201' %}
{% set log5='SRC=192.168.200.150 DST=192.168.200.151' %}

{% set pattern='SRC=(192\.)(168\.)(200\.)(1[0-9]{2}|200) DST=\\1\\2\\3\\4' %}

{{ log1 | regex_match(pattern) }}
{{ log2 | regex_match(pattern) }}
{{ log3 | regex_match(pattern) }}
{{ log4 | regex_match(pattern) }}
{{ log5 | regex_match(pattern) }}

Return Data:

CODE

False
True
True
False
False

Lookarounds

Lookarounds require that specific text must or must not appear immediately before or after the current match position. The match succeeds or fails based on both the characters matched by the main pattern and the surrounding context enforced by the lookarounds.

Lookaheads

Lookaheads check what appears immediately after the current position. A positive lookahead ((?=pattern)) requires the pattern to follow, while a negative lookahead ((?!pattern)) requires that it does not follow.

Examples - Lookaheads

EXAMPLE 1

Input:

CODE

{{ 'D3' | regex_search('D3(?= v[0-9.]+)') }}
{{ 'D3 v17.5' | regex_search('D3(?= v[0-9.]+)') }}
{{ 'D3 v17.5' | regex_search('D3 v[0-9.]+') }}

Reminder: Inside a character class, the dot (.) means a literal period, not "any character". See regex_search to learn how it works.

Return Data:

JSON

[]
['D3']
['D3 v17.5']

Explanation: (?= v[0-9.]+) requires D3 to be immediately followed by a space and a version number.

EXAMPLE 2

Input:

CODE

{% set arr = ["D3 v17.6", "D3 Security", "D3 v17.6 Demo"] %}
{{ arr | regex_extract_array('\w+(?= v[0-9.]+)(?=.* Demo)') }}

Reminder: \w is equivalent to the [A-Za-z0-9_] character class. See regex_extract_array to learn how it works.

Return Data:

JSON

[
  "D3"
]

Explanation: The lookaheads are evaluated independently at the same position (where \w+ ends), are not dependent on each other, and all must succeed for the match to be returned. In this case, only "D3 v17.6 Demo" satisfies both lookaheads.

EXAMPLE 3

Input:

CODE

{{ 'D3' | regex_search('D3(?! v[0-9.]+)') }}
{{ 'D3 v17.5' | regex_search('D3(?! v[0-9.]+)') }}
{{ 'D3 v17.5' | regex_search('D3 v[0-9.]+') }}

See regex_search to learn how it works.

Return Data:

JSON

['D3']
[]
['D3 v17.5']

Explanation: (?! v[0-9.]+) requires that D3 is not immediately followed by a space and a version number.

EXAMPLE 4

Input:

CODE

{% set arr = ["D3 Security", "D3 v18.0", "D3 Docs"] %}
{{ arr | regex_extract_array('D3(?! v[0-9.]+)') }}

See regex_extract_array to learn how it works.

Return Data:

JSON

[
  "D3",
  "D3"
]

Explanation: In "D3 v18.0", the required negative-lookahead pattern does not apply.

Lookbehinds

Lookbehinds check what appears immediately before the current position. A positive lookbehind ((?<=pattern)) requires the pattern to precede it, while a negative lookbehind ((?<!pattern)) requires that it does not precede it.

Examples - Lookbehinds

EXAMPLE 1

Input:

CODE

{{ 'v17.5' | regex_search('(?<=D3 )v[0-9.]+') }}
{{ 'D3 v17.5' | regex_search('(?<=D3 )v[0-9.]+') }}

Reminder: Inside a character class, the dot (.) means a literal period, not "any character". See regex_search to learn how it works.

Return Data:

JSON

[]
['v17.5']

Explanation: (?<=D3 ) requires the version number to be immediately preceded by D3 .

EXAMPLE 2

Input:

CODE

{% set arr = ["Demo D3 v18", "D3 v18.0", "D1 v3 Demo"] %}
{{ arr | regex_extract_array('(?<=D3 )(?<=Demo D3 )v[0-9.]+') }}

See regex_extract_array to learn how it works.

Return Data:

JSON

[
  "v18"
]

Explanation: The lookbehinds are evaluated independently at the same position (where the version string begins), are not dependent on each other, and all must succeed for the match to be returned. In this case, only "Demo D3 v18" satisfies both lookbehinds.

EXAMPLE 3

Input:

CODE

{{ '2026' | regex_search('(?<!D3 )2026') }}
{{ 'D3 2026' | regex_search('(?<!D3 )2026') }}
{{ '2027' | regex_search('(?<!D3 )2026') }}

See regex_search to learn how it works.

Return Data:

JSON

['2026']
[]
[]

EXAMPLE 4

Input:

CODE

{% set arr = ["D3 vSOC", "SOC", "Autonomous SOC"] %}
{{ arr | regex_extract_array('(?<!v)SOC') }}

See regex_extract_array to learn how it works.

Return Data:

JSON

[
  "SOC",
  "SOC"
]

Explanation: (?<!v) requires that SOC is not immediately preceded by the character v.