Whatever I find interesting

Polynomials by Radicals

Introduction

\(\)The solution of equations has been a source of fascination for those interested in mathematics since antiquity. Although our notation has advanced greatly since 2000 years ago, it is remarkable that few of us are ever taught to solve equations for which the solution has not been known at least that long.

As a child with an abiding interest in maths, I soon became interested in understanding how equations of the third degree i.e. containing a term in \(x^3\) had first been solved in the 17th century, and subsequently even more so at the discovery that for equations of degree 5 and above, there was no general formula.
It was many years later before I discovered precisely what the ‘insolvability of the quintic’ actually meant, and quite a few years after that before I felt I really understood what was and wasn’t true.

Most explanations of the underlying theory – usually known as Galois theory – build up a complicated edifice of mathematical structure before providing the key result – that some, indeed most, polynomial equations of degree 5 and above cannot be solved by a formula which involves basic arithmetic and the taking of \(n\)th roots. While this approach is eminently suitable for an aspiring mathematician, it has two weaknesses:

  • It disguises the fact that much of this beautiful subject can be appreciated more directly, with no knowledge beyond secondary school maths.
  • It seldom bothers to discuss the construction of the solutions in those cases which can be solved.

This paper is my attempt to provide the explanation I wish I had had as a youngster.

Prerequisites

Any article has to make some assumptions about its audience. In my case, I’m assuming you have a good understanding of advanced secondary school ( high school ) algebra, the kind of stuff a keen 17 year old should know. In particular, you should be comfortable with

  • algebraic expressions and how to manipulate them
  • the usual way of solving quadratic equations by manipulation or by using the formula
  • simultaneous linear equations in 2, 3 or 4 variables and how to solve them.
  • what a complex number is, and how to do arithmetic with them
  • the fact that complex numbers allow us to take \(n\)th roots of any complex number, and that there are always \(n\) \(n\)th roots of any complex number. ( If you haven’t learned this, you can just take it on trust ).

It will also help if you have some familiarity with the idea of symmetry operations.

Preliminaries


In addition to the prerequisites above, there is one famous mathematical result you should know about the complex numbers, which is so important it’s called the “Fundamental Theorem of Algebra”. Since you may not have come across this, we’ll explain it here. If you are familiar with it, feel free to skip this section.

The fundamental theorem of algebra deals with the roots of polynomial equations and the factors of polynomials, i.e. equations like
\( a x^n + b x^{n-1} … + p = 0 \). where the coefficients are all complex numbers. What it tells us is that every such equation can be
factorised as a constant factor \(a\) times a product of \(n\) linear factors of the form \( x – x_j \), i.e.
$$ a x^n + b x^{n-1} + + p = a (x – x_1)(x – x_2)….(x – x_n) $$
Of course, some of the \(x_i\) may be equal to each other, but there are always exactly \(n\) factors. Furthermore, the \(x_i\) which appear in the expression are precisely the various roots of the original equation, in other words if we substitute \(x_i\) into the original polynomial, we get \(0\).

I don’t want to go into the detail of the proof – there are plenty of good websites which will explain it to you if you are interested. But it is worth pointing out a couple of things:

  • It’s obvious that if \(x – x_i\) divides an equation, then \(x_i\) must be a root, since the result has a factor of \(0\)
  • You don’t need complex numbers for it also to be true ( but less obvious ) that if \(x_i\) is a root, then \(x-x_i\) is a factor.
  • But you do need complex numbers to be able to guarantee that there are enough roots to produce the linear factors.
    Consider, for example, \(x^2 + 1\). This harmless looking equation has no solution in real numbers ( because we’d need a square root of -1 ), so doesn’t split into linear factors with real coefficients. Over the complex numbers, though we have: $$ x^2 + 1 = (x + i)(x – i) $$ where of course \(i = \sqrt{-1}\)
    !! Quadratic equations viewed differently

We’ll start by looking at a well-known problem, the solution to quadratic equations, in a different way from the normal school algebra approach.

Our objective is to find numbers which solve the equation \( ax^2 + b x + c = 0\).
Whenever we have to solve a complex problem, it’s always a good idea to try and solve a simpler one first. In this case, we can simplify our equation without losing anything, by noticing that if we divide through by \(a\), the resulting equation has exactly the same solutions, but makes the subsequent algebra a bit simpler.

From now on, we will deal exclusively with polynomials whose leading coefficient is 1. You may see these referred to elsewhere as “monic”.

Now, we can take advantage of the fundamental theorem of algebra to note that there must be two complex numbers \( x_1 \) and \( x_2 \) possibly equal, such that $$ x^2 + b x + c = (x-x_1)(x-x_2) $$
If we multiply out, we discover that $$ x^2 + b x + c = x^2 – (x_1 + x_2) x + x_1x_2 $$

so that \( b = – (x_1+x_2) \) and \( c = x_1x_2 \)

At first sight, this doesn’t help us much. Instead of one quadratic equation, we have two different equation, one of which involves a product of the roots. But here’s the thing: We know that the sum of the roots is \( -b\). If we could somehow find the difference of the roots: \( d = x_1 – x_2 \), then we could easily find the roots: \( x_1 = (-b + d)/2, x_2=(-b-d)/2 \) .

So is there any simple way to find \(d\)? First, notice that the expressions for \(b\) and \(c\) in terms of the roots have an interesting property, if you swap the roots, they remain unchanged. We say that they are “symmetric” in the roots. (In this case, with only two roots, that simply means that replacing \(x_1\) with \(x_2\) and vice versa has no effect on \(b\) or \(c\).)

It should be pretty clear that we can create expressions involving addition, subtraction, multiplication and division with \(b\) and \(c\), and the result will still be symmetric in the roots. But we want \(d\) and \(d\) is not symmetric – if we swap the roots, then \( d \rightarrow -d \).

The trick is to consider \( d^2 \). This is unchanged if we swap the roots, \( (-d)^2 = (d)^2 \) . So it is symmetric in the roots, and
maybe we can find an expression for it in terms of b and c. Let’s see:
$$ d^2 = (x_1 – x_2)^2= x_1^2 + x_2^2 – 2 x_1 x_2 $$
so we’re going to need the squares of the coefficients. The easiest way to get those is to consider
$$ b^2 = (x_1 + x_2)^2 = x_1^2 + x_2^2 + 2 x_1 x_2 $$
and then notice that $$ d^2 – b^2 = 4x_1x_2 = 4c $$
so $$ d^2 = b^2 – 4 c $$ and we’ve found what we need: \( d = \sqrt{b^2-4c} \).

Putting it all together: \( -b + d = 2 x_1 \) so \(x_1 = \frac{-b + d}{2} = \frac{-b + \sqrt{b^2 – 4c}}{2} \) and similarly \(x_1 = \frac{-b – d}{2} = \frac{-b – \sqrt{b^2 – 4c}}{2} \) and we end up with the familiar formula \( x = {-b \pm \sqrt{b^2-4c} \over{2}} \)

From quadratic to cubic.

The above derivation of the solution to the general quadratic equation differs significantly from the method usually taught ,essentially expressing \( x^2 + b x + c \) as \( (x + b / 2)^2 + c_1 \) for an easy to find \( c_1 \), or in other words splitting the constant term up so that part of it “completes the square” of the terms containing x, and turns the solution into a square root plus a bit of simple arithmetic.

When we consider cubic equations, the contrast between the approach here and the traditional one is even greater. The traditional approach to solving a cubic equation involves two pieces of apparently implausible manipulation followed by a sudden denouement. Essentially, though, what is going on is a series of manipulations of the equation by substituting for \( x \) cleverly chosen expressions which allow the equation to be simplified.

It’s a matter of opinion whether the above derivation of the solution to the general quadratic equation is easier or harder than the traditional approach’. Mechanically it’s about the same amount of work, but it’s probably less intuitive, and certainly took much longer to be discovered. It does, however, have one key advantage, which is that it offers a fruitful generalisation to cubics and beyond. So let’s tackle the cubic equation. We’ll work with the equation in the (carefully chosen but perfectly general form)
$$ x^3 – s_1x^2 + s_2x -s_3 $$
Now if we assume that the roots are \( x_1\), \( x_2 \), and \( x_3 \) then, by multiplying out \( (x-x_1)(x-x_2)(x-x_3) \) as we did for the quadratic, we find that $$ s_1 = (x_1+x_2+x_3) $$ $$ s_2 = (x_1x_2 + x_2x_3 + x_3x_1) $$
$$ s_3=x_1x_2x_3 $$ and again we note that these are are symmetric functions.

How are we to get from here to the roots themselves? In this case there are six possible ways of transposing the three roots. Using notation which is standard in the literature of what are known as permutation groups, we have:

  • (1) – the boring transposition which does nothing and is known as the identity.
  • (1 2) which turns \(x_1 \rightarrow x_2\) and \(x_2 \rightarrow x_1 \) leaving \(x_3\) unchanged.
  • (1 3) which turns \(x_1 \rightarrow x_3\) and \(x_3 \rightarrow x_1 \) leaving \(x_2\) unchanged.
  • (2 3) which turns \(x_2 \rightarrow x_3\) and \(x_3 \rightarrow x_2 \) leaving \(x_1\) unchanged.
  • (1 2 3) which turns \(x_1 \rightarrow x_2\) and \(x_2 \rightarrow x_3 \) and \(x_3 \rightarrow x_1 \)
  • (1 3 2) which turns \(x_1 \rightarrow x_3\) and \(x_2 \rightarrow x_1 \) and \(x_3 \rightarrow x_2 \)

So the idea is to start with expressions in the roots which gradually increase the symmetry until we can eventually use the coefficients to construct what we need. Remember, when we did this for the quadratic, we made liberal use of square roots, and in particular, we looked at the expressions \(x_1 – x_2\) and \( x_1 + x_2 \). Is there anything similar we can do for cubics? One way of thinking about these expressions is that they are different ways of combining the roots with the various square roots of 1, and this is useful, because square roots of 1 get simpler when you square things.

So what about expressions involving the cube roots of 1. We’d better start by working out what those cube roots are. Well, 1 is obviously one of them. So we know that \( x^3 – 1 =0 \) has a factor of \( x-1 \)
Doing the division we find that \( x^3 – 1 = (x-1)(x^2+x+1) \).
Now the second factor is a quadratic equation, and we can use our quadratic formula to discover that its two solutions are:
\( \frac{-1 + \sqrt(3)i}{2} \) which we will call \(\omega\), and \( \frac{-1 – \sqrt(3)i}{2} \) which you can easily check is equal to \(\omega^2\). And by the same token \( (\omega^2)^2 = \omega^4 = \omega * \omega^3 = \omega * 1 = \omega \). So each of \( \omega \) and \( \omega^2 \) is the square of the other!

Armed with our cube roots, we can extend our pattern for quadratics, and think about expressions like:
$$ x_1 + x_2 + x_3 $$
$$ x_1 + \omega x_2 + \omega x_3 $$
$$ x_1 + \omega x_2 + \omega^2 x_3 $$

The first of these is just one of our coefficients. The second looks interesting, but if we try to cube it to see what happens, we don’t get anything very useful. The third, on the other hand, is worthy of more investigation, along with the similar expression swapping \(\omega\) and \(\omega^2\). So, let’s give them names:

$$ l_1 = x_1 + \omega x_2 + \omega^2 x_3 $$
$$ l_2 = x_1 + \omega^2 x_2 + \omega x_3 $$

and note that if we know \( l_1, l_2 \) and \( s_1 \) we have three linear equations in three unknowns, and that means we can find our solutions. So for example, we can find \( x_1 \) as follows:
$$ \frac{1}{3} (l_1 + l_2 + s_1) = x_1 + \frac{(1 + \omega + \omega^2)}{3} x_2 + \frac{(1 + \omega^2 + \omega)}{3} x_3 = x_1 $$
using the fact that \(1 + \omega + \omega^2 = 0\) as we discovered above.

By the way, \( l_1, l_2 \) are sometimes known as the “Lagrange Resolvents”.

We’ve made some progress, but we’re not there yet as we can see by applying our permutations to \( l_1 \) and \( l_2 \)
A rather easy calculation making liberal use of the fact that \( \omega^3=1 \) shows us that:

$$ (1 2) l_1 = x_2 + \omega x_1 + \omega^2 x_3 = \omega ( x_1 + \omega^2 x_2 + \omega x_3) = \omega l_2 $$
and similarly
$$ (1 3) l_1 = \omega^2 l_2 $$
$$ (2 3) l_1 = l_2 $$
$$ (1 2 3) l_1 = \omega^2 l_1 $$
$$ (1 3 2) l_1 = \omega l_1 $$
and of course
$$ () l_1 = l_1 $$
with similar results for d2.

This is promising. Let’s try to get rid of those pesky factors of \( \omega \) by taking cubes. Then (1 2), (1 3), and (2 3) swap \( l_1^3 \) with \( l_2^3 \) while the other three permutations leave them unchanged. Then, using the same approach as for the quadratic, we can see that \( m = l_1^3 + l_2^3 \) is fully symmetric, as is \( n = (l_1^3 – l_2^3)^2 \)

So, we have found values \(m\) and \(n\) which are symmetric in the roots, and which we can therefore construct as polynomials in \(s_1\) \(s_2\) and \(s_3\). We can then use \(m\) and \(\sqrt{n}\) to construct \(l_1 \) and \(l_2\) , and then solve the simultaneous equation to find the roots. We’ve thus turned the problem of solving a cubic equation into the problem of constructing a given symmetric expression from the symmetric coefficients.

At this stage, the keen reader may wish to stop and read up more on the symmetric polynomials, and perhaps even have a shot at constructing the expressions for \(m\) and \(n\) above. In any event, although it’s not that easy to derive, it’s easy enough ( although very messy by hand ) to check that \( n^2 \) can be represented as
\(-27( s_1^2 s_2^2 + 4 s_1^3 s_3 + 4 s_2^3 -18 s_1 s_2 s_3 + 27 s_3^2) \) and \( m = 2s_1^3 – 9 s_1 s_2 + 27 s_3 \)

So, we have our solution

  • Construct \(m\) and \(n^2\) from the coefficients using the formulas above
  • Find \(n\) and use it and \(m\) to obtain \(l_1\) and \(l_2\)
  • Using \(l_1\) and \(l_2\) and \(s_1\), find the roots.

The cubic equation has given us a tantalising glimpse of a relationship between expressions involving the roots of the equation and the symmetries of those roots. It seems as though we can construct the roots through a chain of calcuations at each stage of which the key step is to take a root of a more symmetric expression in order to obtain a less symmetric one. For the quadratic, we needed to do this exactly once. The group has order two, and a simple square root was all that was needed. For the cubic, we needed both square roots and cube roots as there were \( 2 \times 3 \) ways of swapping the roots.

A hint of the big picture

Well it turns out that there is an even deeper correspondence between the symmetries of expressions constructed from the roots, and the values which can be constructed from those expressions.

Rubic Cube

Rubic Cube

here are loads of sites on the web with various ways of solving a rubic cube, but I’ve never seen the method […]

Leave a comment