2.0 KiB
+++ title = "PCRE" date = "2022-09-13" showSummary = true summary = "CGo-free port of PCRE2 to the Go programming language" weight = 10 +++
About
PCRE is a CGo-free port of the PCRE2 regular expression engine to Go. Being CGo-free means that rather than just being bindings to the PCRE2 library, this is a pure Go implementation of PCRE2.
How it works
The implementation leverages a remarkable tool known as ccgo, which is a compiler that converts C code into Go. The source code of PCRE2 was converted to Go using ccgo
. I only have linux/amd64
and linux/arm64
systems, so cross-compilation using various C toolchains was performed and tested using qemu-user-static
to emulate the desired CPU architectures.
For macOS, the process was more intricate as there is no suitable cross-compilation toolchain from Linux. Therefore, a macOS Virtual Machine was created using OSX-KVM and used to build PCRE2 for darwin/amd64
and darwin/arm64
.
Once the code was converted, a memory-safe interface was created on top of it. This interface adheres to the conventions used in Go's standard library regexp
package, so it can be used as a drop-in replacement.
Motivation
Go's standard library regexp
package lacks features like lookaheads and lookbehinds. This was intentional, as it guarantees that regular expressions cannot be exploited for a Denial-of-Service (DoS) attack, known as a ReDoS attack. However, there are cases where these features are necessary, and the expression is compiled into the program or provided in a configuration file, so the source is trusted.
When to avoid
It is important to note that PCRE2 is vulnerable to ReDoS attacks because of its extra features, such as lookaheads and lookbehinds. If the source of the expression cannot be trusted, it is advisable not to use PCRE2, and instead use the Go standard library regexp
package.