Wizard

s (dotAll) flag for regular expressions • Exploring ES2018 and ES2019

原文链接: exploringjs.com

(Ad, please don’t block.)

s (dotAll) flag for regular expressions

This chapter explains the proposal “s (dotAll) flag for regular expressions” by Mathias Bynens.

Overview

Currently, the dot (.) in regular expressions doesn’t match line terminator characters:

> /^.$/.test('\n')
false

The proposal specifies the regular expression flag /s that changes that:

> /^.$/s.test('\n')
true

Limitations of the dot (.) in regular expressions

The dot (.) in regular expressions has two limitations.

First, it doesn’t match astral (non-BMP) characters such as emoji:

> /^.$/.test('😀')
false

This can be fixed via the /u (unicode) flag:

> /^.$/u.test('😀')
true

Second, the dot does not match line terminator characters:

> /^.$/.test('\n')
false

That can currently only be fixed by replacing the dot with work-arounds such as [^] (“all characters except no character”) or [\s\S] (“either whitespace nor not whitespace”).

> /^[^]$/.test('\n')
true
> /^[\s\S]$/.test('\n')
true

Line terminators recognized by ECMAScript

Line termators in ECMAScript affect:

  • The dot, in all regular expressions that don’t have the flag /s.
  • The anchors ^ and $ if the flag /m (multiline) is used.

The following for characters are considered line terminators by ECMAScript:

  • U+000A LINE FEED (LF) (\n)
  • U+000D CARRIAGE RETURN (CR) (\r)
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

There are additionally some newline-ish characters that are not considered line terminators by ECMAScript:

  • U+000B VERTICAL TAB (\v)
  • U+000C FORM FEED (\f)
  • U+0085 NEXT LINE

Those three characters are matched by the dot without a flag:

> /^...$/.test('\v\f\u{0085}')
true

The proposal

The proposal introduces the regular expression flag /s (short for “singleline”), which leads to the dot matching line terminators:

> /^.$/s.test('\n')
true

The long name of /s is dotAll:

> /./s.dotAll
true
> /./s.flags
's'
> new RegExp('.', 's').dotAll
true
> /./.dotAll
false

dotAll vs. multiline

  • dotAll only affects the dot.
  • multiline only affects ^ and $.

FAQ

Why is the flag named /s?

dotAll is a good description of what the flag does, so, arguably, /a or /d would have been better names. However, /s is already an established name (Perl, Python, Java, C#, …).

Comments

Next: Promise.prototype.finally()