MIT OCW Single Variable Calculus - Week 4
Exponents
To recap from last time, we took the derivative of
\[
a_k = \left(1 – \frac{1}{k} \right)^k
\]
as \(k\) tends to infinity:
\[
\lim_{k \to \infty} a_k = e
\]
We came to this result by means of the following (abreviated) process:
\[
\begin{array}{l}
ln\;a_k \to 1 \qquad \text{as k approaches infinity} \\
e^{ln\;a_k} \to e \qquad \text{as k approaches infinity} \\
e^{ln\;a_k} = a_k \quad \text{because} e^{ln\;a} = a
\end{array}
\]
We then used that to come up with a formula for \(e\):
\[
e = \lim_{k \to \infty} \left( 1 + \frac{1}{k} \right)^k
\]
We previously covered solving:
\[
\frac{\delta}{\delta x} x^r = r\;x^{r – 1}
\]
for all real r. Now we will cover solving it for all r.
Method 1: use base e.
\[
\begin{align}
x^r &= \left(e^{ln\;x} \right)^r\\
&= e^{r\;ln\;x} \\
\frac{\delta}{\delta x}x^r &= \left( e^{r\;ln\;x} \right)’ \\
&= e^{r\;ln\;x}(r\;ln\;x)’ \\
&= x^r \frac{r}{x} \\
&= r\;x^{r – 1}
\end{align}
\]
which shows that it holds for the above.
Method 2: logarithmic differentiation.
\[
u = x^r, \quad ln\;u = r\;ln\;x
\]
\[
\begin{align}
\frac{u’}{u} &= (ln\;u)’ \\
&= \frac{r}{x} \\
u’ &= u \frac{r}{x} \\
&= x^r \frac{r}{x} \\
&= r\;x^{r – 1}
\end{align}
\]
which shows that it holds for the above.
Review
General formulas
Practice \((u + v)‘, (c\;u)’, (u\;v)‘, (\frac{u}{v})’\), the last two being the product rule and the quotient rule respectively.
Also know the chain rule:
\[
\frac{\delta}{\delta x}f(u) = f’(u)\;u’(x) \qquad \text{where} \quad u = u(x)
\]
and implicit differentiation – inverses, logarithmic differentiation.
Specific functions
\( x^r, sin\;x, cos\;x, tan\;x, sec\;x, e^x, ln\;x, tan^{-1}x, sin{-1}x \)
Chain rule
We previously didn’t explain why the chain rule was true. Now we will show it through an example.
Take the function: \(y = 10x + b\). \(y\) is changing 10 times as fast as x, therefore \(\frac{\delta y}{\delta x} = 10\).
Now consider: \(x = 5t + a\), then \(\frac{\delta x}{\delta t} = 5\).
The chain rule says that if \(y\) is increasing 10 times as fast as \(x\), and \(x\) is increasing 5 times as fast as \(t\), then \(y\) is increasing 50 times as fast as \(t\).
You can use the chain rule to make other rules easier to use.
For example, the quotient rule:
\[
\begin{align}
f &= \left( \frac{1}{v} \right) \\
f’ &= \left( \frac{1}{v} \right)’ \\
&= ( v^{-1} )’ \\
&= ( v^{-1} )’ v’ \qquad \text{by the chain rule} \\
&= -v^{-2}v’
\end{align}
\]
and the full derrivation of the chain rule:
\[
\begin{align}
f &= \left( \frac{u}{v} \right) \\
f’ &= \left( \frac{u}{v} \right)’ \\
&= (u\;v^{-1})’ \\
&= u’v^{-1} + u(-v^{-2}v’) \\
&= \frac{(u’v – uv’)}{v^2}
\end{align}
\]
Examples
Consider: \(\frac{\delta}{\delta x} sec\;x\).
\[
\begin{align}
\frac{\delta}{\delta x} sec\;x &= \frac{\delta}{\delta x}(cos\;x)^{-1} \\
&= – (cos\;x)^{-2} cos’x \\
&= – (cos\;x)^{-2}( -sin\;x) \\
&= \frac{sin\;x}{cos^2x} \\
&= \frac{1}{cos\;x} \times \frac{sin\;x}{cos\;x} \\
&= sec\;x\:tan\;x
\end{align}
\]
Generally, when differentiation trig functions, express the answer in terms of the original trig function if at all possible, as we did above.
Consider: \(\frac{\delta}{\delta x} ln(sec\;x)
\[
\begin{align}
\frac{\delta}{\delta x}ln(sec\;x) &= ln(sec\;x)‘(sec\;x)’ \\
&= \frac{1}{sec\;x}(sec\;x)’ \\
&= \frac{1}{sec\;x}(sec\;x\;tan\;x) \\
&= tan\;x
\end{align}
\]
Consider: \(\frac{\delta}{\delta x}(x^{10} + 8x)^6\)
\[
\frac{\delta}{\delta x}(x^{10} + 8x)^6 = 6(x^{10} + 8x)^5(10x^9 + 8)
\]
We used the chain rule above because expanding the equation out to the 6th power would have taken a lot of space.
Consider: \(\frac{\delta}{\delta x}e^{x\;tan^{-1}x}\)
\[
\begin{align}
\frac{\delta}{\delta x}e^{x\;tan^{-1}x} &= e^{x\;tan^{-1}x}(x\;tan^{-1}x)’ \\
&= e^{x\;tan^{-1}x}(x(tan^{-1}x)’ + tan^{-1}x) \\
&= e^{x\;tan^{-1}x}\left(tan^{-1}x + \frac{x}{1 + x^2}\right)
\end{align}
\]
Review Continued
The definition of the derivative:
\[
\frac{\delta}{\delta x}f(x) = lim_{\Delta x \to 0} \frac{f(x + \Delta x) – f(x)}{\Delta x}
\]
We discovered the derivative of the following functions using the first principles of differentiation:
\(\frac{1}{x}, x^n, sin\;x, cos\;x \text{last two using} \quad x = 0, a^x, (uv), \left(\frac{u}{v}\right)\)
We also discussed the meaning of a derivative, that is, the rate of change with respect to some variable.
You can also read the formula for a derivative backwards, and get the function from the limit:
\[
\begin{align}
\lim_{u \to 0}\frac{e^u – 1}{u} &= \frac{e^{u – 0} – e^0}{u} \\
&= \frac{\delta}{\delta x}e^u \bigg|_{u = 0} \\
&= 1
\end{align}
\]
Applications of Differentiation
Linear approximations
The formula for linear approximations:
\[
f(x) \approx (fx_0) + f’(x_0)(x – x_0)
\]
This means that if you have a curve, such as \(y = f(x)\), then it is approximately the same as it’s tangent line, \(y = f(x_0) + f’(x_0)(x – x_0)\).
For example, consider the function \(f(x) = ln\;x\) and it’s derivative, \(f’(x) = \frac{1}{x}\), at \(x = 1\):
\[
\begin{align}
f(1) &= ln\;1 = 0 \\
f’(1) &= 1 \\
ln\;x &\approx 0 + 1(x – 1) \\
&\approx x – 1
\end{align}
\]
This shows that for the function \(ln\;x\), when \(x = 1\), it is approximate to the function \(y = x – 1\). When \(x\) changes significantly, this is no longer true, and \(ln\;x\) will approximate some other function:

Remember the definition of a derivative is the limit as the change in \(x\) goes to zero, of the ratio of the change in the result of the function and the change in \(x\):
\[
f’(x) = \lim_{\Delta x \to 0} \frac{\Delta f}{\Delta x}
\]
You can also look at this backwards, by looking at the limit and determining the derivative based off of that.
In linear approximation, what we’re essentially saying is that as \(\Delta x\) gets close to 0:
\[
\frac{\Delta f}{\Delta x} \approx f’(x_0)
\]
That is, the average rate of change (\(\frac{\Delta f}{\Delta x}\)) is close to the infintesimal rate of change (\(f’(x)\)) as \(x\) approaches some value.
The formula above is the same as the formula for linear approximation:
\[
\begin{align}
\frac{\Delta f}{\Delta x} &\approx f’(x_0) \\
\Delta f &\approx f’(x_0)\Delta x \\
f(x) – f(x_0) &\approx f’(x_0)(x – x_0) \\
f(x) &\approx f(x_0) + f’(x_0)(x – x_0)
\end{align}
\]
Systematic discussion of linear approximation
When performing a systematic discussion, convention is to use \(x_0 = 0\). This will result in the formula for linear approximation:
\[
f(x) \approx f(0) + f’(0)x
\]
Below will be an analysis of the linear approximation of a few notable functions:
\[
\begin{array}{ l | l | l | l | l }
f & f’ & f(0) & f’(0) & f(0) + f’(0)x \\ \hline
sin\;x & cos\;x & 0 & 1 & x \\
cos\;x & -sin\;x & 1 & 0 & 1 \\
e^x & e^x & 1 & 1 & 1 + x \\
ln(1 + x) & \frac{1}{1 + x} & 0 & 1 & x \\
(1 + x)^r & r(1 + x)^{r – 1} & 1 & r & 1 + rx \\
\end{array}
\]
And geometrically:
\(sin\;x\)

\(cos\;x\)

\(e^x\)

The last two examples are important. Due to the convention of \(x = 0\), we can’t find the tangent of \(ln\), because the slope of \(ln(0)\) tends \(-\infty\). The slope of \(x^r\) also tends to \(\infty\). To deal with this, we shift the function by starting from a base of one.
Going back to our earlier approximation of \(ln\), we can consider the following:
\[
ln\;u \approx u – 1
\]
from above, and just set \(u = 1 + x\) to achieve the approximation at \(x = 0\).
Applications of linear approximation
Consider the following example:
\[
ln\;1.1 \approx \frac{1}{10}
\]
We achieved this by using the rule from above:
\[
ln(1 + x) \approx x
\]
and just set \(x = \frac{1}{10}\).
This only holds because \(x\) is sufficiently small. We work with linear approximations of functions because they are easier to calculate than the functions themselves.
You can see this in the following examples:
Example 1
Find the linear approximation “near \(x = 0\)” (\(x \approx 0\)) of:
\[
\frac{e^{-3x}}{\sqrt{1 + x}}
\]
We can find this from the formulas and their approximations above:
\[
\begin{align}
\frac{e^{-3x}}{\sqrt{1 + x}} &= e^{-3x}(1 + x)^{-\frac{1}{2}} \\
&\approx (1 – 3x)(1 – \frac{1}{2}x) \\
&\approx 1 – 3x – \frac{1}{2}x + \frac{3}{2}x^2 \\
&\approx 1 – \frac{7}{2}x
\end{align}
\]
Note that we discared the quadratic term \(\frac{3}{2}x^2\), because in the transitions above from the functions to their linear approximations, we already discard a lot of quadratic and exponential information, and we’re only concerned with a linear approximation. It stands to reason that we discard all quadratic information. Also note that when \(x\) is sufficiently small, any quadratic terms are significantly smaller than the linear terms, and can safely be ignored.
This is kind of the opposite as big O notation in computer science, where we discard smaller terms because we’re only interested in the high value approximations.
Example 2
This example is intended to be relatively real life, and model something you’ll come across in real life.
Consider the surface of the earth, and a satellite in orbit around the earth at velocity \(v\). The satellite is a GPS satellite, and requires keeping time, \(T\). On the earth, from our frame of reference, we have time \(T’\). Special relatively tells us there will be a time dilation between the times \(T\) and \(T’\), given by:
\[
T’ = \frac{T}{\sqrt{1 – \frac{v^2}{c^2}}}
\]
where \(v\) is the velocity of the satellite, and \(c\) is the speed of light.
The goal is to determine the difference between times on the satellite and on the earth, or \(\Delta T\)
We can solve this by using the approximations above, by setting \(u = \frac{v^2}{c^2}\):
\[
\begin{align}
T’ &= T(1 – u)^{-\frac{1}{2}} \\
&\approx T(1 + \frac{1}{2}\frac{v^2}{c^2})
\end{align}
\]
To show how this is real life based, when putting the GPS system into orbit, scientists needed to determine whether or not the time dilation would cause errors. The way the satellites are now, \(v = 4km/s\) and \(c = 3 \times 10^5 km/s\), which means that \(\frac{v^2}{c^2} \approx 10^{-10}\). As you can see, this resolves to an error of a few mm of resolution. Therefore it was determined the dilation wouldn’t cause significant error.
Quadratic approximations
The quadratic approximation is an extension of linear approximation. The formula for quadratic approximation is:
\[
f(x) \approx f(x_0) + f’(x_0)(x – x_0) + \frac{f’’(x)}{2}(x – x_0)^2
\]
As you can see, it involves second derrivatives.
Now we can extend the common functions we did for linear approximation, and fill them for quadratic approximation:
\[
\begin{array}{ l | l | l | l | l }
f & f(0) + f’(0)x & f’’ & f’’(0) & f(0) + f’(0)x + \frac{f’’(0)}{2}x^2 \\ \hline
sin\;x & x & -sin\;x & 0 & x \\
cos\;x & 1 & -cos\;x & -1 & 1 – \frac{1}{2}x^2 \\
e^x & 1 + x & e^x & 1 & 1 + x + \frac{1}{2}x^2 \\
\end{array}
\]
Geometrically, the significance of the quadratic term can be shown in the following graph:

As you can see in the graph, the quadratic approximation is a parabola beneath the linear approximation that more closely fits the cosine wave at \(x = 0\). This tells us more information about the function, such as what it’s doing on both the positive and negative limits.