Quadratic Sieve

The quadratic sieve is a general purpose factoring method invented by Carl Pomerance in 1981. Invention of quadratic sieve greatly advanced the science of factorization of integers, particularly those considered difficult, those large integers containing exactly two prime factors of approximately the same precision.

Running time of quadratic sieve is dependent on size of integer to be factored, and the quadratic sieve has accomplished some impressive records in the field of factorization of integers.

This page presents some simple examples of the factorization of small integers. Although using the quadratic sieve on small integers is like driving a thumb tack with a sledge hammer, the examples illustrate how the quadratic sieve is implemented.

Introduction
The quadratic sieve attempts to build quadratic congruences of form $$x^2 \equiv y^2 \pmod{N}$$ where $$N$$ is number to be factored.

Many congruences are created:

$$x_1^2 \equiv y_1 \pmod{N}$$

$$x_2^2 \equiv y_2 \pmod{N}$$

$$x_3^2 \equiv y_3 \pmod{N}$$

$$\dots\dots\dots$$

$$x_{10000}^2 \equiv y_{10000} \pmod{N}$$

Suitable congruences are combined to produce congruent squares:

$$X^2 \equiv Y^2 \pmod{N}$$ where

$$X^2 =  x_2^2  \cdot  x_3^2  \cdot  x_5^2  \cdots  x_{9999}^2$$

$$Y^2 = y_2 \cdot y_3 \cdot y_5 \cdots y_{9999}$$

Then $$N\mid (X^2 - Y^2)$$ and, with a little luck,

$$\text{factor}_1 = \text{igcd}(X + Y, N)$$ and

$$\text{factor}_2 = \text{igcd}(X - Y, N)$$ where $$\text{igcd}$$ is function $$\text{integer greatest common divisor.}$$

Implementation
This exercise calculates prime factors of $$N = 5137851827.$$ $$$$ $$$$ $$$$

Build Factor base
Let the Factor Base contain 40 small primes, the first 40 for which $$N$$ is a quadratic residue. $$$$

Dictionary with logarithms of primes
Create a table or dictionary containing primes and logarithms of primes. Logarithms to base 2 are calculated. The reason for choosing base $$2$$ will become apparent later.

Initialize sieve
In a regular sieve fine material falls through sieve and coarse material is retained. In this sieve coarse material falls through sieve and fine material is retained.

The expression "coarse material" refers to integers containing a small number of large primes, the great majority of integers.

The expression "fine material" refers to integers containing a large number of small primes, primes that are members of this factor base, or "smooth" according to this factor base.

The base or root of operations is the smallest value of $$x$$ for which $$y = x^2 - N$$ is a positive number.

$$root = x = 71679,\ y = 27214.$$

Activate sieve
While information in table above is accurate, it is incomplete. Processing of sieve attempts to find values for which information is complete.

Step 1
The code examines location 71679. Product of factors 2,11 = 22, not enough to provide complete factorization of $$y = x^2 - N.$$

Location 71679+2 or 71681 is initialized with [2] and location 71679+11 or 71690 is appended by [11]. Location 71679 is deleted.

Step 2
The code examines location 71680. Factor 13 is not enough to provide complete factorization of $$y = x^2 - N.$$

Location 71680+13 or 71693 is initialized with [13]. Location 71680 is deleted.

Step 3
The code examines location 71681. Factor 2 is not enough to provide complete factorization of $$y = x^2 - N.$$

Location 71681+2 or 71683 is initialized with [2]. Location 71681 is deleted.

Step 4
Location 71682 does not exist in sieve. The code examines location 71683. Factor 2 is not enough to provide complete factorization of $$y = x^2 - N.$$

Location 71683+2 or 71685 is appended by [2]. Location 71683 is deleted. Processing continues in this way until location 71690 is examined. Factors [17, 23, 373, 11] are complete factorization of $$y = x^2 - N.$$ Values [ 71690, [17, 23, 373, 11] ] are appended to array "smooth."

Sieving continues until array "smooth" contains 45 members, at which time sieving is complete.

To improve speed
There are very few values of $$y$$ that are smooth. Therefore code is designed to discard unsuitable values of $$x$$ quickly, and to make other decisions quickly.

Decisions to discard unsuitable candidates are:


 * if x does not exist in sieve.


 * if there is only 1 prime for location x. Complete factorization of y requires at least 2 primes.

Code intended to reduce processing time:

$$y = x^2 - N$$

$$y = (root + a)^2 - N$$

$$y = root^2 + 2\cdot a\cdot root + a^2 - N$$

$$y = root^2 - N + 2\cdot a\cdot root + a^2$$

$$y = \text{rootSquaredMinusN} + \text{a(twoRoot + a)}$$

Multiplication in last statement above is of smaller numbers than calculation of $$y = x^2 - N.$$ As size of $$N$$ increases, last statement above becomes more efficient.

Divisions and multiplications are time consuming. Instead of multiplying primes or dividing $$y = x^2 - N$$ to determine smoothness, logarithms of primes are added and compared to $$\log(y).$$ Calculation of $$\ln(y)$$ or $$\log(y)$$ to base $$10$$ is time consuming. Because $$y$$ is type $$\text{int,}$$ a good approximation of $$\log(y)$$ to base $$2$$ can be calculated quickly. As it happens, in this application "close enough" is good enough.

$$\log_2(37)$$ for example:

As binary, $$37 = '100101'$$

Length of $$'100101'$$ is $$6.$$ Therefore $$\log_2(37) = 5.$$

$$\log_2(53)$$ for example:

As binary, $$53 = '110101'$$

Length of $$'110101'$$ is $$6.$$ Therefore $$\log_2(53) = 5.$$

However, second most significant bit is set. Therefore $$\log_2(53) = 5 + 1 = 6.$$

Sieving complete
Python code that accomplishes the sieving is:

Thousands of values of $$x$$ are tested. At any one time the range of values of $$x$$ in sieve is $$< 400$$ and sieve is mostly empty.

Array of smooth values
Array "smooth" contains 45 members: If you look closely at contents of array "smooth," you notice that primes $$383, 367, 317, 347, 311$$ are used exactly 1 time for each prime. Therefore, locations $$71849, 72425, 73322, 73576, 74427$$ are not usable.

This section removes members containing these values of $$x$$ from array "smooth." Unfortunately, the removal of these values of $$x$$ causes other members to become unusable. The process continues until array "smooth" is ready, meaning that each prime in array "smooth" appears at least 2 times. The final version of array "smooth" contains 38 members:

Prime factors of y = x^2 - N
Assign a unique bit to each prime number in factor base:

Values of x in smooth array
Assign a unique bit to each value of $$x$$ in smooth array:

Create Matrix
The matrix contains patterns representing values of $$x$$ and patterns representing the corresponding prime factors of $$y = x^2 - N.$$

Process matrix
Last line of matrix $$\text{Line 11}$$ becomes the source of the next operation. $$\text{Line}\ \ \text{11}$$ has $$\text{bit 3}$$ set. Targets of the operation are all other lines with $$\text{bit 3}$$ set, lines designated as $$\text{Line 1}$$ through $$\text{Line 10.}$$

$$\text{Line 11}$$ is combined in turn with $$\text{Line 10, Line 9, Line 8,}\ \dots\ \text{Line 2, Line 1,}$$ by means of operation exclusive or, thus:

$$\text{Line 11}$$ is deleted and $$\text{bit 3}$$ has been removed from all values on Right Hand Side of matrix.

The process is repeated. When a zero value is found on Right Hand Side of matrix, it means that this value is a perfect square.

Produce factors of N
Processing the matrix revealed 7 values of X that produce a perfect square on Right Hand Side of congruence.

Perfect square #1
00000000010000000001001000100000000000

The 4 bits set in this pattern represent values of x : [72268, 72750, 73246, 74544]

X = 72268 * 72750 * 73246 * 74544 = 28706195569530528000

These 4 values of x produce a value on RHS of congruence containing factors:

[11, 11, 13, 13, 17, 17, 19, 19, 23, 23, 59, 59, 113, 113, 127, 127, 163, 163, 241, 241]

Every factor in this list appears an even number of times.

Therefore, Y is product of factors: [11, 13, 17, 19, 23, 59, 113, 127, 163, 241]

Y = 35335010025681509

Using $$\text{p1 = iGCD(X+Y, N)}$$ and $$\text{p2 = iGCD(X-Y, N)}$$

p1,p2 = 1, 5137851827

This congruence produced trivial factors of N.

Perfect square #2
01010000010010001000001111010010000000

The 11 bits set in this pattern represent values of x :

[72030, 72237, 72372, 72509, 72659, 72750, 73472, 73968, 74544, 75252, 75433]

X = product of values of x = 331906592471738688342554821695566634067059049758720000

From these 11 values of x:

Y = product of factors: Y = 3343990074727707345948316157765706289327953268

Using $$\text{p1 = iGCD(X+Y, N)}$$ and $$\text{p2 = iGCD(X-Y, N)}$$

p1,p2 = 89123, 57649

This congruence produced non-trivial factors of N.

Example #2
This example presents a more advanced implementation of the quadratic sieve because it includes the following features:


 * Use of powers of primes.


 * Use of primes not in factor base, "large" primes.


 * Participation in a distributed computing effort.

Past attempts to factor large integers were successful only because many computers contributed to the production of "smooth" values of $$y.$$

One way to organize a distributed computing effort is to assign different tasks to many different computers so that no computer duplicates the work of another. For example, different computers can be assigned the following tasks:


 * Smooth values for $$y = x^2 - N$$ with $$y$$ positive.


 * Smooth values for $$y = x^2 - N$$ with $$y$$ negative.


 * Smooth values for $$y = x^2 - 2\cdot N$$ with $$y$$ positive.


 * Smooth values for $$y = x^2 - 2\cdot N$$ with $$y$$ negative.

This example assumes the role of computer assigned:

Smooth values for $$y = x^2 - 4\cdot N$$ with $$y$$ negative.

To ensure that this task does not duplicate other work, values of $$x$$ are all odd.

Let $$n = 147573952589676412927.$$

Then $$N = 4\cdot n.$$

$$\text{base} = 24296004001$$ where $$\text{base}$$ is an odd integer very close to $$\sqrt{N}.$$

Because $$N$$ is even and $$x$$ is always odd, $$y = x^2 - N$$ is always odd.

Consequently, prime number $$2$$ is not in factor base.

Factor base
Create the factor base, a list containing 230 small primes for which $$N$$ is a quadratic residue:

$$\text{[3, 7, 13, 23, 37, 41, 53, 61, 67,} \dots\dots \text{3121, 3137, 3167, 3203, 3217, 3251]}$$

You may notice that primes $$5,11,17,19, \dots$$ do not exist in this factor base.

Tables of logarithms
Create a dictionary of logarithms, called $$\text{logs}$$: For example, $$\text{log}_2(3251) = \text{logs[3251]} = 11.66667.$$

Create a dictionary of antilogarithms, called $$\text{aLogs}$$: For example, desired prime $$= \text{aLogs[11.6515]} = 3217.$$

Preparation
The following code adds to sieve information concerning powers of primes.

A closer look at this information will be presented later.

Initialize sieve
Initialized sieve contains 624 valid locations.

Check sieve
It is wise to check entries in sieve because an error in initialized sieve can cause many problems later.

First entry in sieve is :

$$\text{sieve}[24296004819] = [(6166, 11.59012)]$$

Value $$11.59012 = \text{log}_2(3083)$$ and $$6166 = 2*3083.$$

Value $$6166$$ is the decrement, always even.

Decrement must be exactly divisible by associated prime.

Value $$24296004819 = x$$ and $$(x^2 - N) % \text{prime}$$ must be $$0.$$

The other entry containing this prime is:

$$\text{sieve}[24296002923] = [(3902, 10.93), (6166, 11.59012)]$$

There must be exactly 2 instances of each decrement, and difference between associated values of $$x$$ must not be exactly divisible by decrement.

Here is an entry for prime $$3:$$

$$\text{sieve}[24294998077] = [(1062882, 1.58496)].$$ Value $$1.58496 = \text{log}_2(3)$$

Decrement $$= 1062882 = 2* 531441$$ and $$531441 = 3^{12}.$$

This decrement appears in the sieve exactly 2 times, at $$24294998077$$ and $$24295715435,$$ and $$(24294998077 - 24295715435)%1062882$$ is non-zero, correct.

Check: $$(24294998077^2 - N) % (3^{12}).$$ This calculation $$= 0,$$ which is correct.

In this way, all entries in sieve are checked.

Activate sieve
The following data are representative of statistics collected during sieving. Range of values in sieve was about 1.75 million, and number of entries in sieve at any time was about 600.

Within this range of values of $$x$$ only odd values are used.


 * 8,238 values of $$x$$ were discarded because they did not exist in sieve.


 * 89,543 values of $$x$$ were discarded because they were not "smooth." Note that decision to discard did not depend on calculation of $$y$$ or $$\text{log}_2(y).$$


 * 2,219 values of $$x$$ were examined for smoothness. Of these, 2 were discarded.


 * Array "smooth" contains 2,217 members.

Process array "smooth."
Array "smooth" contains entries similar to this example:

$$24296003881$$ is value of $$x.$$

Antilogs are replaced by the corresponding prime.


 * decrement $$106 = 2\cdot 53.$$


 * decrement $$386 = 2\cdot 193.$$


 * decrement $$562 = 2\cdot 281.$$


 * decrement $$54 = 2\cdot 27 = 2\cdot 3^3.$$


 * decrement $$18 = 2\cdot 9 = 2\cdot 3^2.$$


 * decrement $$14 = 2\cdot 7.$$


 * decrement $$6 = 2\cdot 3.$$


 * If there is entry for $$3^3,$$ there must be entry for $$3^2.$$


 * If there is entry for $$3^2,$$ there must be entry for $$3.$$

Product $$53*193*281*3*3*7*3$$ divides $$y$$ exactly.

Quotient $$10627$$ is a "large" prime. Without the powers of $$3,$$ this entry may have been missed.

This entry becomes : When a prime is included here, it was used an odd number of times above.

Here is an example of a smooth value of $$y:$$ There is no large prime. This entry becomes: Prime $$3$$ is not used. It did not appear an odd number of times.

The smooth array is processed as necessary to eliminate any entry containing a prime used only once. Any prime in smooth is used at least twice.

Here is an example deleted because the large prime was used only once:

Here is an example retained because the large prime was used twice: This entry becomes: Recall that the range of values of $$x$$ examined by the sieve was 200,000.

Yet, here we see large prime 408,803 used twice. How is this possible?

Because prime $$1732331$$ was found twice, the same question can be asked for these entries: Here is an example of a large prime used 3 times: If you look closely, $$24295964813 - 24295844383 = 120430 = 12043\cdot 10.$$ Why only 3 entries for prime $$12043?$$

Result is array smooth containing 286 entries. Of these 96 are smooth. 190 contain large primes.

The total number of primes is 281. Of these 91 are large primes. You can see that large primes are almost a third of all primes used.

Note that number of entries is more than number of primes. This indicates that the probability of finding perfect squares is good.

Although this array is called "smooth," the numbers show that it is not smooth. Perhaps a name such as "useful" would be better.

$$91*2 = 182,$$ less than $$190$$ containing large primes. This shows that some large primes must be used at least $$3$$ times.

Produce results
Create and process matrix as in example above.

Remember that all values of $$y$$ are negative. Use last entry of matrix as source and combine this source with all other lines in matrix. This eliminates factor $$-1$$ from matrix.

6 exact squares were produced:

First exact square produced the trivial solution. All others produced the non-trivial solution.

Here are details about solution number 2: Bits in this solution represent values of $$x:$$ $$x = 776937957\ \dots\dots\ 4216461181640625,$$ integer containing $$\text{1371}$$ decimal digits.

$$y = 128050773\ \dots\dots\ 5131604744727049,$$ integer containing $$\text{1021}$$ decimal digits.

Review
Example #2 above began with 100,000 simple operations on the sieve. Yes, the range of the sieve was about 1.75 million. This means that there were entries in bottom of sieve that were never used.

The 100,000 simple operations produced about 2,000 items of data worth investigating, This investigation produced a matrix of length 286 from which 5 non-trivial congruences were derived.

The original factor base contained 230 primes. Of these, 190 were used, leaving 40 unused.

If it seems that there was "too much data," this is not a problem. If you find yourself with not enough data, this really is a problem.

$$n$$ is in fact $$2^{67} - 1,$$ called $$M_{67},$$ a number made famous by Professor Frank Nelson Cole in 1903.

Before the era of computers he produced the factors of $$n$$ with pencil on paper, a magnificent achievement.

Links to Related Topics
Professor Cole's Paper

Mersenne prime

Carl Pomerance

Quadratic sieve

RSA-129