If we list all the natural numbers below $10$ that are multiples of $3$ or $5$, we get $3, 5, 6$ and $9$. The sum of these multiples is $23$.
Find the sum of all the multiples of $3$ or $5$ below $1000$.
Let's consider the problem of summing all multiples of $a$ or $b$ below $L$.
Generator expression
The cleanest solution is to just sum over all integers below $L$ that divide $a$ or $b$ using a generator expression to avoid memory allocations if we used a list comprehension:
function sum_multiples_two_generator(a, b, L)
return sum(n for n in 1:L-1 if n % a == 0 || n % b == 0)
end
Benchmarking sum_multiples_two_generator(3, 5, 1000), it runs in 2.184 μs.
Using the inclusion-exclusion principle
We can do better because we know that the sum of the first $n$ integers is
We can use this to derive a formula for the sum of all multiples of $m$ below $L$, which we'll denote $s(m, L)$. There are $\ell = \lfloor \frac{L-1}{m} \rfloor$ multiples of $m$ below $L$. So
Using this, $s(a, L) + s(b, L)$ would be the sum of all multiple of $a$ or $b$ below $L$, but this double counts shared multiples, whose sum would be $s(\operatorname{lcm}(a, b), L)$. So the actual sum of all multiples of $a$ or $b$ below $L$, which we'll denote $S([a, b], L)$ should be
This is an application of the inclusion-exclusion principle for the case two finite sets, which says that
where $A$ and $B$ are two finite sets and $|S|$ is the cardinality of the set $S$.
We can code this up as
function sum_multiples(m, L)
if m >= L
return 0
end
l = div(L - 1, m)
return m * l * (l + 1) ÷ 2
end
function sum_multiples_two(a, b, limit)
return sum_multiples(a, limit) +
sum_multiples(b, limit) -
sum_multiples(lcm(a, b), limit)
end
and benchmarking sum_multiples_two(3, 5, 1000) I get a median time of 0.941 ns which is roughly 2000x faster. It might even be faster but it's quite difficult to benchmark an operation that takes less than 1 ns as system clocks don't have sub-nanosecond resolution.
Three factors
The generator solution can easily be extended to deal with three factors:
function sum_multiples_three_generator(a, b, c, L)
return sum(n for n in 1:L-1 if n % a == 0 || n % b == 0 || n % c == 0)
end
Let's sum the multiples of 3, 5, and 7 below $10^6$. Benchmarking sum_multiples_three_generator(3, 5, 7, 10^6) we get 3.249 ms. Taking ~1500x longer than the 2 factor case with $L = 10^3$ makes sense since it's now checking 1000 times more numbers and 50% more factors.
The inclusion-exclusion principle extends to any number of finite sets, although it does get more complex. For three finite sets $A$, $B$, and $C$:
So for three factors $a$, $b$, and $c$ we'll have to add all the individual sums, subtract all pairwise interactions to avoid double counting, but then add back in the triple intersection that was subtracted too many times:
We can implement this as:
function sum_multiples_three_ie(a, b, c, L)
return sum_multiples(a, L) +
sum_multiples(b, L) +
sum_multiples(c, L) -
sum_multiples(lcm(a, b), L) -
sum_multiples(lcm(a, c), L) -
sum_multiples(lcm(b, c), L) +
sum_multiples(lcm(a, b, c), L)
end
and if we benchmark sum_multiples_three_ie(3, 5, 7, 10^6) we get 0.932 ns which again is sub-nanosecond.
Complexity Analysis
Let $f$ be the number of factors we're considering.
The generator solutions take $\mathcal{O}(Lf)$ time as they iterate through every number until $L$ and perform $f$ modulo operations per number.
The inclusion-exclusion solutions achieve $\mathcal{O}(2^f \log M)$ time complexity, where $M$ is the maximum factor value. They need to compute $2^f - 1$ non-empty subsets to apply the inclusion-exclusion principle. Each subset requires computing an LCM using the Euclidean algorithm, which takes $\mathcal{O}(\log M)$ time, followed by using the sum formula which takes constant time.
If I have a lot of factors to crunch through, I personally prefer the elegance of the generator expression as $2^f - 1$ terms is a lot. If performance was critical there's probably a nice way to generate the terms of the inclusion-exclusion principle. But then, in the case of many factors and smaller $L$ the generator expression may end up being faster.