Algebraic approach to derivatives requires two components. The first is elementary derivatives, which are the basic building blocks from which we will construct derivatives. The second ingredient is grammar, rules by which we put together the elementary derivatives. In other words, we need to know how to differentiate a function that is created by combining elementary functions using algebraic operations and composition. First we will look at local statements.

Theorem.

Assume that functionsfandgare both differentiable ata. Then the functionsf+g,and f−g,are differentiable at f⋅ga, and the functionis differentiable at f/gaifMoreover, the derivatives satisfy g(a) ≠ 0.

**Example:**

Theorem.

Assume that a functionfis differentiable ataand a functiongis differentiable atThen the composed function b=f(a).is differentiable at g○f=g(f)aand

Note that we substitute *b* into *g*′, which is logical. When you
picture how the functions act, you will see that *a* lives in a space
different from the one where *g* has its domain, so it cannot be
substituted into *g*.

**Example:**

Theorem.

Assume that functionsfandgare both differentiable on an open setG. Then the functionsf+g,and f−g,are differentiable on f⋅gG, and the functionis differentiable on f/gGassuming thaton g≠ 0G. Moreover, the derivatives satisfy

**Example:**

Theorem(The chain rule).

Assume that a functionfis differentiable on an open setGand a functiongis differentiable on the setThen the composed function f[G].is differentiable on g○f=g(f)Gand

**Example:**

We will return to this example below.

The proper use of these formulas is explained in Methods Survey. Here we will investigate these rules closer. We start with a simple corollary.

Theorem(derivative is linear).

Assume that functionsfandgare both differentiable on an open setGandA,Bare real numbers. Then

[ Af+Bg]′ =Af′ +Bg′.

This theorem is very important if we want to consider differentiation as a
transformation from one space of functions (namely, all functions
differentiable on a certain set *G* form a linear space) into another
space of functions. Since transformations that are linear are the "best"
transformations, it is nice to know that differentiation satisfies this
condition.

Here we will look closer at the chain rule. We will try to present it in different ways, since it is usually the most difficult part of differentiation for students, on the other hand it is really quite simple, it is just a matter of understanding it. If it seems difficult when exposed the usual way, perhaps you will see it better in a different way.

We start by observing that the chain rule actually tells us what happens to a derivative of a function when we change its variable via substitution. For simplicity we will assume that all functions we mention here are differentiable wherever we need it.

Consider a function *g* that depends on a variable *y*. We can
differentiate it, obtaining *g*′(*y*).**substitution**
*y* = *f* (*x*)?*h*(*x*) = *g*( *f* (*x*)).*h* ′ to *g*′. The chain rule
says that

*h* ′(*x*) = *g*′(*y*)⋅*f* ′(*x*)
= *g*′( *f* (*x*))⋅*f* ′(*x*).

We can rewrite this as follows:

Fact.

If the variableydepends on another variablex, then

[ g(y)]′ =g′(y)⋅y′.

To see how it works we return to the last example above, where we wanted to
calculate the derivative of *x*).*y* = 2*x*,*y*),

*x*)]′ = [sin(*y*)]′
= cos(*y*)⋅*y*′
= cos(2*x*)⋅[2*x*]′ = 2cos(2*x*).

In fact, one can think of this as a general rule, and indeed there are textbooks where they do not have elementary derivatives as we had them in the previous section, but their list has entries like

*y*^{a}]′
= *a**y*^{a−1}⋅*y*′,*e*^{y}]′
= *e*^{y}⋅*y*′,*y*)]′
= cos(*y*)⋅*y*′ etc.

Such a rule works also for "ordinary" functions, since if *y* is the
basic variable, then *y*′ = 1*y*^{ a}]′ = *a**y*^{ a−1}

Another interesting point of view at the chain rule is to look at this substitution business via the Leibniz notation. The rule then reads

It seems as if the chain rule was just cancelling in fractions, since we get
the left hand side by cancelling d*y* on the right. This is just another
advantage of pretending that the differentials exist (cf.
Leibniz notation).
When I learned calculus
informally in high school, it was explained to us via differentials by our
physics teacher (we needed it to solve problems) and it all seemed natural. I
used to cancel d's left and right and it never failed me. Then I started to
study math seriously and I learned that it was wrong :-).

One last remark. Why is it called the chain rule? The next section shows the reason.

How do the above rules work if more functions are involved? Linearity is a rule that readily adopts to more summands even for general transformations, so it must also work for differentiation:

*A*_{1} *f* _{1} + *A*_{2} *f* _{2} + ... + *A*_{n }*f* _{n}]′
=
*A*_{1} *f* _{1}′ + *A*_{2} *f* _{2}′ + ... + *A*_{n }*f* _{n}′.

We also have a rule for the product of more functions. We first show a formula for three functions to show the basic idea and then the general rule:

*f* ⋅*g*⋅*h*]′
= *f* ′⋅*g*⋅*h*
+ *f* ⋅*g*′⋅*h*
+ *f* ⋅*g*⋅*h*′

*f* _{1}⋅ *f* _{2}⋅...⋅ *f* _{n}]′
=
*f* _{1}′⋅ *f* _{2}⋅...⋅ *f* _{n}
+ *f* _{1}⋅ *f* _{2}′⋅...⋅ *f* _{n}
+ . . . +
*f* _{1}⋅ *f* _{2}⋅...⋅ *f* _{n}′

Note that it is not necessary to remember these rules, because they can be easily deduced from the rules for two functions. Indeed, addition and multiplication are associative operations, so we can insert parentheses at suitable places to change the expression into an addition/multiplication of two terms and use the usual rule. Rather then complicated explanations we prefer to show an example, where we deduce the rule for the product of three functions.

*f* ⋅*g*⋅*h*]′
= [ *f* ⋅(*g*⋅*h*)]′
= *f* ′⋅(*g*⋅*h*)
+ *f* ⋅[*g*⋅*h*]′

*f* ′⋅(*g*⋅*h*)
+ *f* ⋅(*g*′⋅*h* + *g*⋅*h*′)
= *f* ′⋅*g*⋅*h*
+ *f* ⋅*g*′⋅*h*
+ *f* ⋅*g*⋅*h*′.

Similarly, it is not necessary to remember the rule for composition of more functions. Again, we prefer to show a simpler version for three functions, since the notation for a general case is quite complicated (too many compositions and functions). If you understand the idea, you easily find the derivative of a hundred composed functions.

*h*(*g*( *f* ))]′
= *h*′(*g*( *f* ))
⋅*g*′( *f* )⋅ *f* ′.

As you can see, when we have a chain of composed functions as in the picture,

we differentiate the one at the end (the one that is done last) and substitute into it whatever was there before. Then we disregard the outer function we just differentiated and apply the same procedure to its inside part and again and again...

**Example:**

Now you see why it is called the chain rule. By the way, an experienced "differentiator" (or "derivator" as in Terminator) would skip all the steps and go right to the answer, creating it term by term.

By the way, since division is not an associative operation, there is no rule for "more ratios". For starters, what does it mean "more ratios"? Since there is no obvious answer to this, it is clear that we cannot get any rule here.

Again, linearity makes it simple to iterate differentiation. For the
*k*^{th} derivative we get

*A**f* + *B**g*]^{(k)}
= *A**f* ^{(k)} + *B**g*^{(k)}.

For a product things get a bit more interesting, iteration of the product
rule leads to the *Leibniz formula*

For instance,
*f* + *g*]′′ = *f* ′′*g* + 2 *f* ′*g*′ + *f* *g*′′.*k*^{th} power:

An amazing coincidence, isn't it? There are no rules for higher order derivatives of other operations.

Although finding an inverse function (if it exists at all) is not an operation per se, differentiation is very nice and allows us to find the derivative of the inverse function using the derivative of the original function.

Theorem.

Assume thatfis a function that is continuous and strictly monotonee on some neighborhood of a pointa. Iffis differentiable ataandthen its inverse f′(a) ≠ 0,f_{−1}is differentiable atand b=f(a)

Note that a functions that is strictly monotone is automatically invertible. To understand why we substitute what we substitute, we look at a picture of the situation:

Obviously, *b* is the right place to differentiate the inverse function,
since it cannot accept *a* which lives in a different world. On the
other hand, the only reasonable candidate to put into *f* ′ is
*a*. How would the formula look as a rule?

Things get even more interesting if we rewrite this rule using the Leibniz
notation. We have *y* = *f* (*x*)*x* = *f* ^{−1}(*y*).

Now isn't that nice? To show how this rule can be applied we derive the
formula for the derivative of logarithm. We know that
*y* = ln(*x*)*e*^{ y}.