Skip to content

Consider a purego-loaded native pg_query backend for faster PostgreSQL parsing #4496

Description

@ewhauser

Summary

Would you be open to adding a native PostgreSQL parser backend loaded via purego, with the existing WASI/Wazero parser path retained as a portability fallback?

The current release binaries are great from a distribution standpoint because they are CGO_ENABLED=0, but sqlc's PostgreSQL analysis-heavy workloads appear to pay a meaningful runtime cost for the WASI/Wazero parser path. We tested replacing the github.com/wasilibs/go-pgquery path with a purego-backed native pg_query wrapper while keeping the sqlc binary itself cgo-free.

On one large real-world workload, this cut sqlc analysis/plan generation time by about 2x.

Motivation

Large projects often split sqlc output into many packages while sharing the same schema and migration universe. In those setups, PostgreSQL parsing and analysis become a visible part of developer and CI wall time.

For our workload, profiling showed parser/backend overhead was large enough that swapping the parser implementation materially changed wall time.

Measurement

Workload:

  • Large PostgreSQL schema/migration set
  • Many query files
  • Analysis/plan generation run once over the shared schema/query universe
  • Same sqlc patch and inputs; only parser backend changed
  • Local macOS arm64 machine
  • Go 1.26.2
  • GOGC=200

Results:

Parser backend Wall time
current WASI/Wazero path ~4.9s
purego-loaded native pg_query path ~2.4s

That is roughly a 2x improvement for the analysis/plan generation step.

Distribution size impact

The main tradeoff is that a purego-loaded native backend needs a platform-native shared library.

That said, sqlc already publishes release artifacts per OS/arch, for example:

  • sqlc_..._darwin_arm64.tar.gz
  • sqlc_..._linux_arm64.tar.gz
  • sqlc_..._linux_amd64.tar.gz

So each release archive would only need to include the native library for that platform, not every supported platform's library.

In our local branch, the measured native library sizes were:

Native library Uncompressed Gzipped
darwin arm64 .dylib ~2.4 MiB ~549 KiB
linux arm64 .so ~3.0 MiB ~652 KiB

For the darwin arm64 release shape, the compressed size impact was small:

Artifact Compressed size
upstream darwin arm64 sqlc archive ~13.8 MiB
local sqlc + darwin arm64 native pg_query archive ~14.0 MiB

As a stress check, if we put both darwin arm64 and linux arm64 native libraries in one archive, that archive was ~14.6 MiB. But because sqlc already publishes per-platform release assets, that combined-library shape should not be necessary for normal binary releases.

Implementation direction

This does not need to be a return to cgo-only sqlc builds.

A possible shape:

  • Keep the existing WASI/Wazero parser as the universal fallback.
  • Add a native pg_query backend loaded through purego for supported release platforms.
  • Package one native library per OS/arch release asset.
  • Preserve CGO_ENABLED=0 for the sqlc Go binary itself.
  • Add a parser/backend benchmark so regressions are visible.

The local experiment was mostly an import-level substitution behind sqlc's existing PostgreSQL parser boundary:

- nodes "github.com/wasilibs/go-pgquery"
+ nodes "<purego native pg_query wrapper>"

The main API difference we had to handle was fingerprint formatting, since the native wrapper exposed a numeric fingerprint.

Happy to submit a PR if there is interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions