Suppose p and q are relatively prime integers, and 0 < p < q. If q has
any prime factors besides 2 and 5, then a decimal expansion of p/q will
eventually start to repeat. How long will the repeating part be?
Looking at the sequence of digits is actually a very difficult way to
approach this problem. It's much simpler to look at the sequence of
remainders. Remember how long division works. At every
step, we have a remainder r, 0 <= r < q. To get the next digit in the
expansion, we figure out how many times q goes into 10*r; this gives
us a digit and a new remainder. In C++ code, I might write the long
division algorithm this way:
/* Compute ndigits digits of p/q */
r = p;
for (i = 0; i < ndigits; ++i) {
d[i] = (10*r) / q;
r = (10*r) % q;
}
where those unfamiliar with C should remember that C's integer
division gives an integer result (in C, we have 1/2 == 0, but 1.0/2 ==
0.5). The operator
% gives the remainder of the division.
If we keep computing digits, we will eventually see one of two things:
a zero remainder, or a repeat of a remainder seen in a previous step.
In the former case, p/q can be written as a non-repeating decimal; in
the latter case, we have a repeating fraction, where the length of the
repeating part is the same as the distance between occurrences of the
same remainder in the division algorithm.
The operator % is sometimes called the modulo operator, but
it's actually an odd name. In much of computer science, we speak of
the mod operator
, and mean operator that computes the
remainder in a division.
By contrast, if you're working with
abstract algebra or number theory and you say that two things are
equivalent modulo
something, you mean that there is an
underlying relation that divides your structure into equivalence
classes, and the two things under investigation happen to lie in the
same equivalence class. The usual example of this is time: 11 pm
today and 11 pm yesterday are not the same instant in time, but they
are equivalent modulo the twenty four hour cycle we use for time
keeping. The connection between modulo-as-operation and
modulo-an-equivalence is that when one takes a remainder after
dividing by q, one gets a single canonical representative of a whole
equivalence class of numbers which differ from each other by some
multiple of q. It's like using 11 pm today to refer to 11 pm any day;
they're all equivalent modulo a 24 hour day.
Anyhow. The C code for long division computes a sequence of
remainders, which look like r[k] = (p*10^k) % q. If q has prime
factors other than 2 and 5, then at some point we'll encounter a
remainder which is prime relative to q; at that point, we've started
the repeating part of the fraction. Past that point, the remainders
are all units in the ring of integers modulo q (Z_q),
which is
just a fancy way of saying that all the successive remainders are also
prime relative to q (and therefore have inverses in Z_q). For
instance, 4 and 7 are relatively prime, so 4 is a unit in Z_7; the
inverse is 2, since 4*2 = 7+1 = 1 modulo 7. The set of units in Z_q
is an important object in number theory, so important that it's size
is given a special name: the number of natural numbers less than q
which are prime relative to q is written phi(q); it is called Euler's
phi function or Euler's totient function.
Hmm. I assume totient
is related to quotient,
but I
actually don't know the origin of this word. I'll have to look that
up.
One reason why phi(q) is so important is that it connects the
structure of multiplication and addition in Z_q. For example, for any
m which is a unit in Z_q, I have m^phi(q) = 1; for instance, phi(7) =
6, and 4^6 = 4096 = 1 modulo 7. That means, for instance, that I have
an easy way to write m^(-1); it's just m^(phi(q)-1). There are all
sorts of other entertaining things you can do with this sort of
manipulation -- including figuring out something about the question we
originally asked, which is how long the repeating part in the decimal
expansion of p/q will be. Without loss of generality, I can just
consider 1/q; for more general p/q, I will generate a sequence of
remainders which has the same period as for 1/q. Anyhow, if I start
from the remainder 1 and run long division, the remainders will
generate a cyclic subgroup of the group of units in Z_q -- I'll keep
generating new remainders until at some point I get the remainder 1
back. Since I'm always multiplying by 10, the cycle thus formed is
sometimes called the orbit of 10 in the group of units in Z_q. The
order of this cycle -- the number of elements -- is the number of
digits in the repeating part of the decimal expansion. There is a
wonderfully versatile little theorem, due to Lagrange, which says that
the order of a subgroup must divide the order of the group in which it
is contained; in this case, that means that the number of digits in
the repeating part must divide phi(q).
Cool, no? So the number of digits in the repeating part of a decimal
expansion of p/q must divide phi(q). Actually, if you walk through
the arguments I just made, you find that this is true for any
base. So, for example, take q = 47; since phi(q) = 46 = 2*23, I can
take an expansion in any base and find that the length of the
repeating part is either 0, 1, 2, 23, or 46. Base 2, base 10, base 16
-- they all have to generate repetitions with a period that divides
46. Which divisor of 46 you actually get depends on the base in which
you do the expansion, though.
So while writing the length of the repeating part of p/q is difficult
in general, we can quickly say that it divides phi(q). When q is
prime, phi(q) = q-1, and so we could potentially get a repeating part
as long as q-1 digits. In fact, we often do get a repeating
part as long as q-1 digits... which is why writing numbers in terms of
repeating decimal isn't a very good choice of representations for most
q. If it takes log10(p) + log10(q) < 2*log10(q) digits to represent a
rational number in the form p/q, but as many as q-1 digits to
represent the number as a repeating decimal, then the repeating
decimal representation is exponentially longer than the rational
number representation. Not good! At least, it's not good if you want
to actually do arithmetic; if you want to play with number theory on
the other hand, or with finite state machines, then this is lots of
fun.
As an amusing aside, you can use the development above to write a
wholly impractical scheme for cracking the RSA cryptosystem. You see,
the RSA system is secure because we don't have good algorithms for
computing the Euler totient function for large numbers. Yes, I know
you've heard that the reason that RSA is secure has something to do
with factoring, but the difficulty of factoring large numbers is
really just a proxy for the difficulty of computing Euler's totient.
The actual connection is that if m and n are large prime numbers then
there's a simple formula for the totient of the product: phi(m*n) =
(m-1)*(n-1). But if we do long division in several different bases,
we're almost inevitably going to run into a base in which the length
of the repeating part is equal to phi(q). So the totient function is
actually easy to compute, and any talented grade schooler who knows
about long division and repeating decimals can be taught how to crack
RSA, right? Well, not exactly. The fact that computing phi(m*n) in
this way would take time on the order of m*n -- rather than on the
order of log(m*n) -- means that your precocious grade schooler would
take at least 130 years to crack a 32-bit key, assuming that he never
slept and could get a digit a second right.
A fast computer using a more effective attack algorithm could break a
32-bit RSA key in a matter of minutes (maybe less, I'm not sure). But
those algorithms tend to run into the same explosive cost increases as
the long-division cracking algorithm that I just proposed, so the key
doesn't have to get that much longer to be much more secure.
Currently, most transactions are guarded by keys at least 128 bits
long, and the keys I use for remote login have either 1024 or 2048
bits. So unless you want to go into number theory as a career, you
probably will find that the easiest way to compromise my keys is to
completely go around them -- by finding my office and kicking me in
the shins until I'm willing to hand them over, for instance.