infinitesimal: Nonlinear Function
Created: September 06, 2022
Modified: September 06, 2022

infinitesimal

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

The Leibniz calculus notation using infinitestimal quantities like dxdx or dtdt is simultaneously

  1. Very sensible and intuitive, but also
  2. Constantly confusing to me.

A lot of this is probably specific to my own idiosyncratic understanding, but in any case this note is an attempt to work through points that have confused me.

Discrete time. I like that infinitesimals connect intuitively to the finite-sum case where we're summing over some change Δx\Delta x at each timestep. But when considering discrete 'timesteps' we often implicitly take Δt=1\Delta t = 1, which can make it ambiguous which of Δx\Delta x and ΔxΔt\frac{\Delta x}{\Delta t} we actually mean. In general, if a change is 'per timestep', it should be written as ΔxΔt\frac{\Delta x}{\Delta t} (or similar) to emphasize that two timesteps would give twice as much change, half a timestep half as much, etc.

Dependence. if x(t)x(t) is a function of time, then its derivative dxdt\frac{dx}{dt} is also a function of time. But what about dxdx and dtdt individually? Are these also functions of time? If so, should we not write them with this explicit dependence?

Let's start with dtdt. Ultimately we can think of whatever derivative or integral we're evaluating as the limit of a ratio or sum as some 'width' variable ϵ\epsilon goes to zero. In principle we could define dt(ϵ,t)dt(\epsilon, t) as nonuniform in tt, describing e.g. a sum in which the 'width' of the time buckets is larger in some parts of the domain than in others. But usually it makes sense to identify ϵ\epsilon with dtdt, i.e., consider the limit of all timesteps becoming infinitesimally small in a uniform way, and then dtdt has no time dependence.

On the other hand, x(t)x(t) is a function of time, and any small change in time dtdt will induce a corresponding change dx(t)=x(t+dt)x(t)dx(t) = x(t + dt) - x(t). This quantity dx(t)dx(t) is absolutely a function of time. It is also, implicitly, a function of dtdt, in that a larger input change dtdt will lead to larger output change dxdx, but more properly we can think of both of these as being some function of the underlying ϵ\epsilon.

Partial differentials. How do dxdx and x\partial x relate to each other? Do they have the same units? Does dxx=0dx - \partial x = 0, does dxx=1\frac{d x}{\partial x} = 1, or is there a clear principle for why neither of these can occur in a valid expression?

I think this starts to implicate a computational graph view of calculus. For concreteness of illustration, let x=t2x = t^2, and consider some function f(x,t)f(x, t). The full picture here looks like Pasted image 20220906200617 in which everything is downstream of tt, which functions as an exogenous or 'input' variable. In this picture, the total derivative dfdt\frac{df}{dt} is a question about the graph as a whole: how will the value at the output node ff change in response to a change at the input node tt? The partial derivative ft\frac{\partial f}{\partial t} is a local question about the final node: how will the value f(x,t)f(x, t) of that node change if we vary the second argument while holding the first argument constant? We can relate these using the law of total differentiation:

dfdt=ft+fxdxdt\frac{df}{dt} = \frac{\partial f}{\partial t} + \frac{\partial f}{\partial x} \frac{dx}{dt}

This is in terms of ratios of infinitesimals, where a \partial is always in a ratio with another \partial (and similarly for dd) so the 'units' always cancel out. In general it's not valid to cancel across infinitesimal units, e.g. we can't simplify fxdxdt=fdt\frac{\partial f}{\partial x} \frac{dx}{dt} = \frac{\partial f}{dt} and it's not clear what it would mean if we could.

I think the right way to think about this is that global/total differentials can generally be treated within a single global limit: for some perturbation Δt\Delta t to the input of the computation graph, the perturbations Δx\Delta x and Δf\Delta f fall out naturally, so there is a consistent process relating all of these quantities, in some sense a single coherent thought experiment, which allows us to meaningfully ask about their ratios.

By contrast, the partial derivative ft\frac{\partial f}{\partial t} is asking about a modified copy of the graph where we freeze all inputs to ff except for tt.This is vaguely reminiscent to the graph surgery prescribed by do-calculus for causal inference, but only vaguely. In this graph, δf\delta f and δt\delta t are the only changes that are even defined; the δ\delta indicates units corresponding to this particular thought experiment, so the only way to 'leave the thought experiment' and get an objectively meaningful quantity is to consider the ratio ft\frac{\partial f}{\partial t} or its inverse. There is no meaningful way for partial differentials to interact with quantities in a different thought experiment.

The view of total differentials as a single coherent thought experiment is complicated when we have multiple independent variables. Suppose we we had both tt and rr as exogenous inputs, took x(r,t)x(r, t) as a function of both of them, and then asked about dx/dtdx/dt? I guess derivatives wrt tt have to be interpreted in terms of a limiting process as dtdt goes to zero. And separately if we're computing derivatives with respect to rr, there is a limiting process as drdr goes to zero. But these are separate processes, not tied to any common rate. To put a point on it: what is ΔtΔr\frac{\Delta t}{\Delta r} in this experiment? It's not a well-defined quantity because neither tt nor rr is a function of the other; we have total freedom in the relative rate at which those differentials go to zero. So when we're asking about ΔxΔr\frac{\Delta x}{\Delta r} this must be a different Δx\Delta x from that in ΔxΔt\frac{\Delta x}{\Delta t}: the first exists in a thought experiment where we've changed rr while leaving tt unchanged, and the second in an opposite thought experiment. TODO resolve this.

Differential operators. Why can we write ddt\frac{d}{dt} but not dtd\frac{dt}{d}?

Higher-order derivatives. Why is the second derivative written as d2xdt2\frac{d^2 x}{dt^2} and what do the individual terms of these mean?


It might help me to work through Wikipedia's description of differentials as linear maps. For a function f(x)f(x), we define the differential df=fdxdf = f' dx. A multidimensional function f(x,y)f(x, y) has differential

df=fxdx+fydydf = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y} dy

Here the differential is only defined on functions; we finesse this by treating xx and yy as functions on a 'standard infinitesimal' pp, so that x(p)=px(p) = p and y(p)=py(p) = p? I'm confused by the 'standard infinitesimal' setup because it seems like it implies a dependence between xx and yy that doesn't necessarily exist.