Some days ago I was fiddling with some Lua code, intending to achieve a splash/boot screen inspired to the one developed by Nintendo for the Game Boy. A single-line text entering from the top of the screen and scrolling down until it reaches the centre of it. Once on the final position, a jingle is played.

Notwithstanding being a very basic task, it offers the opportunity for uncovering some interesting code. The most naïve approach requires very little code and it would certainly work. With some additional effort, we can develop some interesting and general code that can be reused in many other circumstances. The most interesting part is how the text is moved through the screen, that is the path it follows. Rather than coding the path movement inside the Message object we are better off creating a Path object that implements it and that can be used by the former.

What kind of paths do we need to implement? We’d like the path to be a sequence of segments. Segments are going to be linear, but curved ones can be useful. A very straightforward way to implement this is by means of Bézier curves. I’m not going to describe them here (I suggest you follow the link to Wikipedia and read there), let’s just say they are very appealing since by simple playing with the curve order one can go from linear to smooth curves.

Let’s dive in some code and seem them in action.

function linear_bezier(p0, p1, t)
    local x = (1 - t) * p0[1] + t * p1[1]
    local y = (1 - t) * p0[2] + t * p1[2]
    return x, y
end

function quadratic_bezier(p0, p1, p2, t)
    local x = (1 - t) * (1 - t) * p0[1] + 2 * (1 - t) * t * p1[1] + t * t * p2[1]
    local y = (1 - t) * (1 - t) * p0[2] + 2 * (1 - t) * t * p1[2] + t * t * p2[2]
    return x, y
end

function cubic_bezier(p0, p1, p2, p3, t)
    local x = (1 - t) * (1 - t) * p0[1] + 2 * (1 - t) * t * p1[1] + t * t * p2[1]
    local y = (1 - t) * (1 - t) * p0[2] + 2 * (1 - t) * t * p1[2] + t * t * p2[2]
    return x, y
end

Easy, uh? An unpretentious and straightforward translation to Lua of the Bernstein polynomial representing the Bézier curves.

Another way to accomplish this is with the De Casteljau’s algorithm, which is in layman’s word a recursive lerp between pairs of control points.

function lerp(a, b, t)
    return (1 - t) * a + t * b
end

function linear_bezier(p0, p1, t)
    return lerp(p0[1], p1[1], t), lerp(p0[2], p1[2], t)
end

function quadratic_bezier(p0, p1, p2, t)
    local p01 = { linear_bezier(p0, p1, t) }
    local p12 = { linear_bezier(p1, p2, t) }
    return linear_bezier(p01, p12, t)
end

function cubic_bezier(p0, p1, p2, p3, t)
    local p01 = { linear_bezier(p0, p1, t) }
    local p12 = { linear_bezier(p1, p2, t) }
    local p23 = { linear_bezier(p2, p3, t) }
    local p012 = { linear_bezier(p01, p12, t) }
    local p123 = { linear_bezier(p12, p23, t) }
    return linear_bezier(p012, p123, t)
end

The visual representation of the De Casteljau algorithm is what many children learn as string art.

Quadratic Bézier in string art

Refining

Both codes are presumably going to give bad performance: no precalculations, a lot of table accesses, creations, and ample use of function calls. There’s plenty of rooms for optimizations. Wait, wait, wait! We all know that premature optimization is evil, but we can be confident to write some elegant and efficient code. Here’s a brief list of changes we can apply:

  • the control points can be passed as a table (i.e. vector) for a more compact function signature,
  • there’s no need to pass the control points every time,
  • we can avoid the recursion if we limit ourselves to some reasonable Bézier curve order (e.g. cubic) and use the Bernstein polynomial,
  • some repeated math operations can be pre-computed and reused to save time, and
  • closures and table unpacking can be used for faster access to functions/variables.

The final resulting code (warning, spoiler ahead since this is going to be the implementation that we’ll pick as “most performant”).

-- The function *compiles* a Bézier curve evaluator, given the control points
-- (as two-element arrays). The aim of this function is to avoid passing the
-- control-control_points at each evaluation.
--
-- It supports linear, quadratic, and cubic Bézier's curves. The evaluators are
-- the following (with `u = 1 - t`)
--
-- B1(p0, p1, t) = u*p0 + t*p1
-- B2(p0, p1, p2, t) = u*u*p0 + 2*t*u*p1 + t*t*p2
-- B3(p0, p1, p2, p3, t) = u*u*u*p0 + 3*u*u*t*p1 + 3*u*t*t*p2 + t*t*t*p3
local function compile_bezier(control_points)
  local n = #control_points
  if n == 4 then
    local p0, p1, p2, p3 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    local p2x, p2y = unpack(p2)
    local p3x, p3y = unpack(p3)
    return function(t)
        local u = 1 - t
        local uu = u * u
        local tt = t * t
        local a = uu * u
        local b = 3 * uu * t
        local c = 3 * u * tt
        local d = t * tt
        local x = a * p0x + b * p1x + c * p2x + d * p3x
        local y = a * p0y + b * p1y + c * p2y + d * p3y
        return x, y
      end
  elseif n == 3 then
    local p0, p1, p2 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    local p2x, p2y = unpack(p2)
    return function(t)
        local u = 1 - t
        local a = u * u
        local b = 2 * t * u
        local c = t * t
        local x = a * p0x + b * p1x + c * p2x
        local y = a * p0y + b * p1y + c * p2y
        return x, y
      end
  elseif n == 2 then
    local p0, p1 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    return function(t)
        local u = 1 - t
        local x = u * p0x + t * p1x
        local y = u * p0y + t * p1y
        return x, y
      end
  else
    error('Bézier curves supported up to 4th order.')
  end
end

local function compile_bezier_opt(control_points)
  local n = #control_points
  if n == 4 then
    local p0, p1, p2, p3 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    local p2x, p2y = unpack(p2)
    local p3x, p3y = unpack(p3)
    return function(t)
        local tt = t * t
        local ttt = tt * t
        local cx = 3 * (p1x - p0x)
        local bx = 3 * (p2x - p1x) - cx
        local ax = p3x - p0x - cx - bx
        local cy = 3 * (p1y - p0y)
        local by = 3 * (p2y - p1y) - cy
        local ay = p3y - p0y - cy - by
        local x = ax * ttt + bx * tt + cx * t + p0x
        local x = ay * ttt + by * tt + cy * t + p0y
        return x, y
      end
  elseif n == 3 then
    local p0, p1, p2 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    local p2x, p2y = unpack(p2)
    return function(t)
        local u = 1 - t
        local a = u * u
        local b = 2 * t * u
        local c = t * t
        local x = a * p0x + b * p1x + c * p2x
        local y = a * p0y + b * p1y + c * p2y
        return x, y
      end
  elseif n == 2 then
    local p0, p1 = unpack(control_points)
    local p0x, p0y = unpack(p0)
    local p1x, p1y = unpack(p1)
    return function(t)
        local u = 1 - t
        local x = u * p0x + t * p1x
        local y = u * p0y + t * p1y
        return x, y
      end
  else
    error('Bézier curves supported up to 4th order.')
  end
end

Choices

Is this the best (faster and/or most numerically stable) method to evaluate a Bézier curve? Are there any other algorithms available?

Yes, there are. Plenty of them.

For example, we can represent for example the curve in polynomial (non-Bernstein) form and use a modified Horner’s method (see here for the algorithm description and PostScript implementation) with the advantage that coefficients can be pre-computed and reused for many curves. This approach, however, while theoretically faster due to the reduces amount of additions and multiplications required, lacks the numerical stability of the De Casteljau’s algorithm. On the plus side, it doesn’t impose any limit on the degree of the curve (although Bézier curves with order greater than 4 aren’t usually adopted since they are difficult to control, and are better substituted with combinations of quadratic/cubic curves).

local function compile_bezier_horner(control_points)
  local n = #control_points
  return function(t)
      local s = 1 - t
      local C = n * t
      local Px, Py = unpack(control_points[1])
      for k = 1, n do
        local ykx, yky = unpack(control_points[k])
        Px = Px * s + C * ykx
        Py = Py * s + C * yky
        C = C * t * (n - k) / (k + 1)
      end
      return Px, Py
    end
end

Also, LÖVE has a specific math API that enables the evaluation of Bézier curves. It evaluates the curve with the De Casteljau’s algorithm, as implemented in native and compiled C++ code. This should be fast and precise, and can be used as follow.

local function compile_bezier_love2d(control_points)
  local points = {}
  for _, v in ipairs(control_points) do
    points[#points + 1] = v[1]
    points[#points + 1] = v[2]
  end
  local bezier = love.math.newBezierCurve(points)
  return function(t)
      local x, y = bezier:evaluate(t)
      return x, y
    end
end

Performances

So many variants to choose from… which one we should pick? We can’t answer properly to this question without some profiling. I proceeded in coding all the Bézier evaluators along with a simple profiling example. You can check the profiling code here.

Order Variant Cost
2 love 100,00%
2 bernstein 10,64%
2 decasteljau 19,64%
2 iterative-decasteljau 105,47%
2 optimized-decasteljau 10,75%
2 horner 32,26%
2 optimized-horner 43,93%
3 love 100,00%
3 bernstein 16,21%
3 decasteljau 43,24%
3 iterative-decasteljau 182,71%
3 optimized-decasteljau 20,66%
3 horner 46,74%
3 optimized-horner 68,22%
4 love 100,00%
4 bernstein 13,38%
4 decasteljau 48,96%
4 iterative-decasteljau 164,57%
4 optimized-decasteljau 18,21%
4 horner 38,22%
4 optimized-horner 46,05%

The baseline reference is the (internal) LÖVE implementation performance, re-based for each order, as we primarily want to determine if we can do better than what the framework offers. For the De Casteljau algorithm I wrote three different variants: (i) the straightforward lerp based implementation that follows the definition itself, (ii) a generalized iterative implementation similar to the one implemented in LÖVE, and (iii) an optimized implementation where lerp function calls are removed by manual inlining. Also for the Horner’s method I devised two different implementations in which the second one despite being more obscure reduce the operations count.

The fastest algorithm is by far the one based on the Bernstein polynomial form, which is reasonable since it offers the change to inline during the “compiling” (via closures) and pre-compute and reuse computations as much as possible. Interesting enough the native framework implementation is among the slower ones; this is due to the fact that each Bézier curve evaluation requires both a meta-table and FFI access that are quite costly (despite exploiting LuaJIT). It’s far better to remain in Lua’s script domain rather than transitioning to native code. The Horner’s method algorithms, on the contrary, aren’t so slow but the pay their general form due to the repeated table unpack calls.

By increasing the order of the curve we expect the performance to reduce. In fact, the impact on the Bernstein is the following:

Order Speed
2 100,00%
3 144,21%
4 188,28%

In terms of precision we checked the result being equal up to the 6th decimal point (10^-6) when compared with the framework (slower) implementation. As a side-note, the Horner’s method algorithms a pretty imprecise and are impractical for common usages.

Reworking the Equations

Considering that the linear interpolation between a and b can either be written as (t - 1) * b + t * a or a + (b - a) * t we can be temped to rework the evaluator as follows (for example, for a 4th order curve)

function bezier_4th(control_points, t)
  local p0, p1, p2, p3 = unpack(control_points)
  local p0x, p0y = unpack(p0)
  local p1x, p1y = unpack(p1)
  local p2x, p2y = unpack(p2)
  local p3x, p3y = unpack(p3)
  local tt = t * t
  local ttt = tt * t
  local cx = 3 * (p1x - p0x)
  local bx = 3 * (p2x - p1x) - cx
  local ax = p3x - p0x - cx - bx
  local cy = 3 * (p1y - p0y)
  local by = 3 * (p2y - p1y) - cy
  local ay = p3y - p0y - cy - by
  local x = ax * ttt + bx * tt + cx * t + p0x
  local x = ay * ttt + by * tt + cy * t + p0y
  return x, y
end

However, while this is perfectly legit on the paper, numerically on a computer the results may diverge. Of course this new, reworked, algorithm uses less multiplications, but the number of additions and subtractions increases. On modern computers the cost of arithmetic operations is pretty much constant across operations. This wasn’t the case on older CPUs during the ’80s and ’90s, where saving a mul or a div could make the difference especially on non FPU equipped computers. Most notably, this algorithm doesn’t ensure proper results for t = 1. That being said, it’s definitely not worth the effort.

Applications

Other than path-management, as in my case, Bézier curves can be used in a lot of interesting cases. I unconsciously avoided them for far too long, it’s about time to get back on track.

One very interesting use of them is described here where, with a clever use of fragment shaders, a flowing lava river is simulated. Guess that these curves can be very useful for dynamically-modified water rivers, too.