Thursday, July 9, 2015

Prime Factorization of an Integer

Problem

Given an integer N, find its prime factorization.

For example, $12 = 2 \times 2 \times  3 = 2^2 \times 3$.

It is somewhat difficult to factorize an integer. If $N$ is small enough, then it is possible to factorize it using prime generate by Sieve of Eratosthenes.

Trial Division Method

Suppose we are trying to factorize $980$. How would we attempt to factorize it with pen and paper? We will try to extract the smallest possible prime factors from it and make it smaller. That is we will try to divide the number with prime numbers and keep track which are capable of dividing it.

The first prime we have is $2$. Can we divide $980$ with $2$? Yes. $980=2 \times 490$. Now we need to factorize $490$ which is smaller and thus easier.

Again, let's try to factorize $490$ with $2$. It is possible, so $980=2\times 2\times 245= 2^2 \times 245$. It is not possible to extract another $2$ from $245$ so we move on to next prime.

$3$ fails to divide $245$, so we move on to $5$. By repeating this process, we find out that $980=2^2 \times 5 \times 7^2$.

This process is kind of laborious, but we have computers to do our bidding. So for this trial division to work, all we need is a list of primes so that we can try dividing N one by one from that list. How will we get that list? Using Sieve of Eratosthenes.

But before we start generating prime, we need to answer another question. Sieve of Eratosthenes can generate prime up to a certain number. So up to which value should we generate prime numbers to factorize N?

Limit for Sieve of Eratosthenes

Well, if we are trying to factorize $N$, there is no point in generating primes that are larger than $N$. So we can generate up to $N$ and factorize using the list with primes less than or equal to $N$. But we can do even better by observing a special property of prime factors of $N$.

$N$ can have many prime factors. Some of the prime numbers will be $< \sqrt{N}$ and some $>=\sqrt{N}$. Exactly how many primes factors can $N$ have that are greater than $\sqrt{N}$?

Notice that if $N$ can never have two prime factors $>\sqrt{N}$ since, $if \: A > \sqrt{N}, then \:  A \times A > N$.

So if we generate primes up to $\sqrt{N}$ and use that list to factorize $N$ then at the end, we will either have 1 or a number greater than $\sqrt{N}$ which will be a prime for sure.

For example, $N=42$ and $\sqrt{N}=6.4 \approx 6$. So let's try to factorize using primes less than or equal to $6$ only, that is using only $[2,3,5]$. What do we get? $N = 42 = 2 \times 3 \times 7$. Since $7$ can no longer be divided with any prime from our list $[2,3,5]$ we stop. $7$ is our last missing prime factor.

So, in order to factorize $N$, we need to generate primes up to $\sqrt{N}$ only. 

Code for Sieve of Eratosthenes

vector<int>factors;
void factorize( int n ) {
    int sqrtn = sqrt ( n );
    for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
        if ( n % prime[i] == 0 ) {
            while ( n % prime[i] == 0 ) {
                n /= prime[i];
                factors.push_back(prime[i]);
            }
            sqrtn = sqrt ( n );
        }
    }
    if ( n != 1 ) {
        factors.push_back(n);
    }
}
Let me explain few things about the code. In line 4, inside the for loop I wrote the following condition, "i < prime.size() && prime[i] <= sqrtn". The first condition ensures that we don't go array over-bound. In order to understand it, imagine what will happen if we try to factorize a prime number $>\sqrt{N}$. Without that condition, you might get RTE.

The second condition ensures we are always trying to factorize using prime less than $\sqrt{N}$.

In line 5, we check if it is possible to divide current $N$ with prime[i]. If so we start a loop in line 6 and keep on extracting until we cannot anymore.

In line 10, we are optimizing our code. Since, after extraction of prime factors, $N$ has become smaller, we now need to only factorize with a smaller amount of prime numbers. Due to this single line, the code converges really quickly.

In line 13, we are checking if the residual number after factorization is 1 or not. If not, it is the only prime factor which is greater than $\sqrt{N}$. So we add it to our prime factor list.

Time Complexity

There are two loops in the code for $factorize()$. The first one is at line $4$. This loop runs once for every prime $\leq \sqrt{N}$. From "Prime Number Theorem", $\pi (N) \approx \frac{N}{ln(N)}$. So there are approximately $\frac{ \sqrt{N} }{ ln ( \sqrt{N} ) }$ primes $\leq \sqrt{N}$.

The second loop at line $6$ runs once for every prime factor $N$ has. So it cannot run more than $log_2(N)$ times.

So the complexity is $O(\frac{ \sqrt{N} }{ ln ( \sqrt{N} ) } + log_2(N) \: )$. In practice, if $N$ is not a prime then due to line $10$, $N$ becomes small quickly. So it is much faster than what we calculated.

Optional Optimization

There is an optional optimization we can perform only when it is possible to generate sieve up to N. Just insert a single statement in line 5.
for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
        if ( sieve[n] == 0 ) break; /*Checks if n is prime or not*/
        if ( n % prime[i] == 0 ) {
The new line added simply checks if the number we are trying to factorize is a prime or not. If it is prime, then obviously we cannot factorize it anymore. This optimization is highly useful when we already have sieve generated up to $N$.

Reference

  1. forthright48 - Prime Number Theorem - http://forthright48.blogspot.com/2015/07/prime-number-theorem.html

No comments:

Post a Comment

Leave comments for Queries, Bugs and Hugs.