Monday, August 10, 2015

Prime Factorization of Factorial

Problem

Given a positive integer $N$, find the prime factorization of $N!$.

For example, $5! = 5 \times 4 \times 3 \times 2 \times 1 = 120 =  2^3 \times 3 \times 5$.

Brute Force Solution

A possible solution is to calculate the value of $x = N!$ and then prime factorize $x$. But calculating $N!$ is tedious. We cannot fit $N!$ where $N > 20$ in a long long variable. We will need to use Big Integer class and that would make things slow. I will soon write a blog post on Big Integer, until then know that using Big Integer it would take more than $N^2$ steps to calculate $N!$.

Is there a better way?

Limits on Prime

Before we move on to the solution, let us first decide the limit on prime. In order to factorize $x = N!$, we have to generate prime numbers. But up to which value? Should we generate all primes less than $\sqrt{x}$? 

Even for a small value of $N$ like $100$, $x$ can be huge with over $100$ digits in it, thus, $\sqrt{x}$ will also be huge. Generating so many primes is not feasible. Using Sieve of Eratosthenes we could generate primes around $10^8$, which is nowhere near $\sqrt{100!}$.

Note that $N! = N \times (N-1) \times (N-2) \times ... \times 2 \times 1$. That is, $N!$ is a product of numbers less than $N$ only. Now, can there be any prime greater than $N$ that can divide $N!$? 

Suppose there is a number $A$ and we factorized it. It is trivial to realize that all its prime factors will be less than or equal to $A$. So in $N!$, which is the product of numbers less than $N$, if we decompose all those numbers to their prime factors, then they will reduce to primes less than or equal to $N$.

For example, $6! = 6 \times 5 \times 4 \times 3 \times 2 \times 1 = (2 \times 3) \times 5 \times 2^2  \times 3 \times 2 = 2^4 \times 3^2 \times 5$.

So the prime factors of $N!$ will be less than or equal to $N$. Generating primes till $\sqrt{N!}$ is not necessary. We just need to generate all primes less than or equal to $N$.

Prime Factorization

Now that we know the limit for primes, we are ready to begin factorizing the factorial. There is more than one way to achieve this. We will see three of them and discuss which one is best.

First - Linear Loop From $1$ to $N$

We know that $N! = N \times (N-1) \times (N-2) \times ... \times 2 \times 1$. So we could simply factorize every number from $1$ to $N$ and add to a global array that tracks the frequency of primes. Using the code for $factorize()$ from here, we could write a solution like below.
vector<int> prime;
int primeFactor[SIZE]; ///Size should be as big as N

void factorize( int n ) {
    int sqrtn = sqrt ( n );
    for ( int i = 0; i < prime.size() && prime[i] <= sqrtn; i++ ) {
        if ( n % prime[i] == 0 ) {
            while ( n % prime[i] == 0 ) {
                n /= prime[i];
                primeFactor[ prime[i] ]++; ///Increment global primeFactor array
            }
            sqrtn = sqrt ( n );
        }
    }
    if ( n != 1 ) {
        primeFactor[n]++;
    }
}

void factFactorize ( int n ) {
    for ( int i = 2; i <= n; i++ ) {
        factorize( i );
    }

    ///Now We can print the factorization
    for ( int i = 0; i < prime.size(); i++ ) {
        printf ( "%d^%d\n", prime[i], primeFactor[ prime[i] ] );
    }
}
We pass the value of $N$ to $factFactorize()$ in line $20$, and it calculates the frequency of each prime in $N!$. It starts a loop from $2$ to $N$ and factorizes each of them. In line $4$ we have the $factorize()$ function modified a bit in line $10$ and $16 $to suit our need. When those lines find a prime factor, they increase the frequency of that prime in the $primeFactor$ array.

It is simple and straight forward, but takes $O(N \times factorize() ) $ amount of time. We can do better.

Second - Summation of Each Prime Frequency

Instead of factorizing from $1$ to $N$, how about we just find out how many times each prime occurs in $N!$ and list them. If $p_1$ occurs $a_1$ times, $p_2$ occurs $a_2$ times $...$ $p_x$ occurs $a_x$ times, then $N! = p_1^{a_1} \times p_2^{a_2}\times ... \times p_x^{a_x}$.

That sounds nice, but how do we find the frequency of prime factors in $N!$. Let us just focus on one prime factor, for example $2$, and find out how many times it occurs in $N!$. We will extend this idea to other primes.

Let $N = 12$. How many times does $2$ occur in $12!$? We know that $12! = 12 \times 10 \times 9 \times ... \times 1$. How many numbers from $1$ to $12$ has $2$ as their prime factors? $\frac{12}{2} =  6$ numbers do and they are $\{ 2, 4, 6, 8, 10, 12 \}$. So we can say that at least $2^6$ is a factor of $12!$. But is there more?

Yes. Notice that $4 = 2^2$, so it has an extra $2$ in it that we did not count. That means for each number that has $2^2$ in them as a factor, we need to add $1$ to our result. How many number are there which has $2^2$ as their factor? $\frac{12}{4} = 3$ numbers, which are $\{ 4, 8, 12 \}$. So we increase our frequency to $6 + 3 = 9$ and say we have at least $2^9$ in $12!$. But is that it?

No. $8 =  2^3$ and for each number with $2^3$ as factor we add $1$ to result. So our result is $9 + \frac{12}{8} = 9 + 1 = 10$. 

Do we try with $16 = 2^4$ now? No. $12!$ cannot have any number with factor $2^4$ since $\frac{12}{16} = 0$. So we conclude that $12!$ has $2^{10}$ as its factor and no more.

Now, we extend this idea to other primes. What is the frequency of prime factor $3$ in $12!$? $\frac{12}{3} + \frac{12}{9} + \frac{12}{27} = 4 + 1 + 0 = 5$. We repeat this for all primes less than equal to $12$.

Therefore, we can say that for a given prime $p$, $N!$ will have $p^x$ as its prime factor where $x = \frac{N}{p} + \frac{N}{p^2} + \frac{N}{p^3} + ... \text{ Until it becomes 0 }$.

So, using this idea our code will look as the following.
void factFactorize ( int n ) {
    for ( int i = 0; i < prime.size() && prime[i] <= n; i++ ) {
        int p = prime[i];
        int freq = 0;

        while ( n / p ) {
            freq += n / p;
            p *= prime[i];
        }

        printf ( "%d^%d\n", prime[i], freq ); ///Printing prime^freq which is factor of N!
    }
}
This code factorizes $N!$ as long as we can generate all primes less than or equal to $N!$. The loop in line $6$ runs until $\frac{n}{p}$ becomes 0.

This code has 3 advantages over the "First" code. 
  1. We don't have to write $factorize()$ code.
  2. Using this code, we can find how many times a specific prime $p$ occurs in $N!$ in $O(log_p (N))$ time. In the "First" code, we will need to run $O(N)$ loop and add occurrences of $p$ in each number.
  3. It has a better complexity for Factorization. Assuming the loop in line $6$ runs $log_2 (N)$ times, this code has a complexity of $O(N log_2 (N))$. The code runs faster than this since we only loop over primes less than $N$ and at each prime the loop runs only $O(log_p (N))$ times. The "First" code ran with $O(N \times factorize() )$ complexity, where $factorize()$ has complexity of $O(\frac{ \sqrt{N} }{ ln ( \sqrt{N} ) } + log_2(N) \: )$.  
This idea still has a small flaw. So the next one is better than this one.

Three - Better Code than Two

Suppose, we want to find out how many times $1009$ occurs in $9 \times 10^{18}!$. Let us modify the "Second" code to write another function that will count the result for us.
long long factorialPrimePower ( long long n, long long p ) {
    long long freq = 0;
    long long cur = p;

    while ( n / cur ) {
        freq += n / cur;
        cur *= p;
    }

    return freq;
}
If we pass $n = 9 \times 10^{18}$ and $p = 1009$, it will return us $8928571428571439$. But this is wrong. The line $7$ in the code overflows resulting in wrong answer. Try it yourself. Print out the value of $cur$ in each step and see when it overflows.

We could change the condition in line $5$ into something like this to solve the situation:
while ( n / cur > 0 )
But this remedy won't work if $cur \times p$ overflows into a positive number. If we want we could use techniques that avoids multiplying two numbers if it crosses a limit, but there is an simpler way.

Note that $\frac{N}{p^3}$ is same as $\frac{ \frac{N}{p^2} }{p}$. So instead of saying that $res = \frac{N}{p} + \frac{N}{p^2}...$, we could rewrite it as

$x = N: \: res = res + \frac{x}{p}$.
$x = \frac{N}{p} = \frac{x}{p}: \: res = res + \frac{x}{p}$.
$x = \frac{N}{p^2} = \frac{ \frac{N}{p} } {p} = \frac{x}{p}: \: res = res + \frac{x}{p}$.
$...$
$x = 0$

Instead of raising the power of $p$, we divide the value of $N$ by $p$ at each step. This has the same effect.

So our code for finding frequency of specific prime should look like following:
long long factorialPrimePower ( long long n, long long p ) {
    long long freq = 0;
    long long x = n;

    while ( x ) {
        freq += x / p;
        x = x / p;
    }

    return freq;
}
There might still be inputs for which this code will overflow, but chances for that is now lower.. Now if we send in $n = 9 \times 10^{18}$ and $p = 1009$, then this time we get $8928571428571425$ as our result.

If we apply this improvement in our $factFactorize()$ function, then it will become:
void factFactorize ( int n ) {
    for ( int i = 0; i &lt; prime.size() &amp;&amp; prime[i] &lt;= n; i++ ) {
        int x = n;
        int freq = 0;

        while ( x / prime[i] ) {
            freq += x / prime[i];
            x = x / prime[i];
        }

        printf ( "%d^%d\n", prime[i], freq );
    }
}
This code has less chance to overflow so it is better.

Resource

No comments:

Post a Comment

Leave comments for Queries, Bugs and Hugs.