023. Non-Abundant Sums

#23 Non-abundant sums - Project Euler. projecteuler.net . Accessed Feb 19, 2023.

Problem Statement

A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of \(28\) would be \(1 + 2 + 4 + 7 + 14 = 28\), which means that \(28\) is a perfect number.

A number \(n\) is called deficient if the sum of its proper divisors is less than \(n\), and it is called abundant if this sum exceeds \(n\).

As \(12\) is the smallest abundant number, \(1 + 2 + 3 + 4 + 6 = 16\), the smallest number that can be written as the sum of two abundant numbers is \(24\). By mathematical analysis, it can be shown that all integers greater than \(28,123\) can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.

Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.

My Solution

To work with .py file(s) in the Python REPL, first cd into the directory containing the .py file(s), and then from the REPL:

>>> from brute_force_non_abundant_sums import *

To refresh the contents after editing the file:

>>> import importlib
>>> importlib.reload(brute_force_non_abundant_sums)
>>> from brute_force_non_abundant_sums import *

With sum_of_proper_divisors(n) from PE 021 , a brute-force algorithm is:

#!/usr/bin/env python

import cProfile
from pstats import SortKey

from itertools import combinations_with_replacement

from project_euler.amicable_numbers.amicable_numbers_deluxe import sum_of_proper_divisors

def generate_abundant_nums(lo, hi):
    """
    Generate the abundant numbers in the range [lo, hi], in sorted order.
    """
    for n in range(lo, hi + 1):
        if sum_of_proper_divisors(n) > n: yield n

def pairwise_sums(nums, N):
    """
    Given nums like [1, 4, 6], return unique sums like [2, 5, 7, 8, 10, 12].
    Sums that are greater than N are excluded.
    """
    sums = set([])
    for (a, b) in combinations_with_replacement(nums, 2):
        n = a + b
        if n > N: continue
        else: sums.add(n)

    return sums

def sum_of_non_abundant_sums():
    """
    Solution for https://projecteuler.net/problem=23
    """
    # (28123, inf] can all be written as the sum of two abundant numbers. Start
    # by assuming all numbers in [1, 28123] cannot be expressed as the sum of
    # two abundant numbers.
    N = 28123
    ans = N * (N + 1) / 2

    # Generate all numbers that can be expressed as the sum of two abundant
    # numbers. Stop at 28124 because beyond that is a futile exercise.
    abundant_nums = generate_abundant_nums(1, N - 1)
    for n in pairwise_sums(abundant_nums, N):
        ans -= n

    return ans

if __name__ == "__main__":
    cProfile.run("print(sum_of_non_abundant_sums())", sort=SortKey.TIME)

Performance-wise:

4179871.0
         12183909 function calls in 4.462 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.357    3.357    4.460    4.460 brute_force_non_abundant_sums.py:36(pairwise_sums)
 12148815    0.924    0.000    0.924    0.000 {method 'add' of 'set' objects}
    28122    0.172    0.000    0.172    0.000 brute_force_non_abundant_sums.py:8(sum_of_proper_divisors)
     6966    0.006    0.000    0.179    0.000 brute_force_non_abundant_sums.py:29(generate_abundant_nums)
        1    0.002    0.002    4.462    4.462 brute_force_non_abundant_sums.py:49(sum_of_non_abundant_sums)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    4.462    4.462 {built-in method builtins.exec}
        1    0.000    0.000    4.462    4.462 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

How much better can pairwise_sums get? Well, nums is \(O(N)\), and \(_{N}C_{2} = \frac{N \cdot (N-1)}{2 \cdot 1} = O(N^2)\). Although generating the pairwise sums in a monotonically increasing order can avoid generating \(n > N\), the overall runtime will still be \(O(N^2)\). How much additional work are we doing anyway? Logging shows that \(\approx 50\%\) of the sums generated greater than \(N\), and so a better pairwise_sums should shave about half of the total running time. That’s an optimization worth pursuing.

Why logging gave a quick and precise answer, is this hand-waving correct? Assuming that the abundant nums are more or less uniformly spread out in \([1, …, N]\), their pairwise sums should also be uniformly spread out in \([2, …, 2N]\), and therefore \(\approx 50\%\) of the sums will be greater than \(N\).

Generating Monotonically Increasing Pairwise Sums

Given a list \(A = [a_1, a_2, …, a_n]\), how can one generate all \(s\), where \(s = a_i + a_j\) and \(s \le k\)?

The first order of business is sorting \(A\) because without some order in \(A\), we can’t possibly produce monotonically increasing sums.

A sample grid to help visualize the problem:


-	1	4	5	8	100	1000	1001
1	2	5	6	9	101	1001	1002
4	5	8	9	12	104	1004	1005
5	6	9	10	13	105	1005	1006
8	9	12	13	16	108	1008	1009
100	101	104	105	108	200	1100	1101
1000	1001	1004	1005	1008	1100	2000	2001
1001	1002	1005	1006	1009	1101	2001	2002

The grid is reflected across the diagonal because addition is commutative (\( a + b = b + a\)). So as far as unique sums are concerned exploring either the upper or lower triangle should lead to the same result.

For convenience, evaluating the lower triangle. On each row, the max is in the right-most cell, and the min is in the left-most cell. However, given rows \(r_i\) and \(r_{i+1}\), it’s not always the case that all of \(r_i\)’s cells have lower values than those in \(r_{i+1}\)’s cells.

While I can’t devise a monotonically increasing traversal path in the grid, there are perf improvements to be made from limiting what we evaluate on row of the lower triangle.

The usage of generators tripped me up:

def unsorted_gen():
  for x in (1, 4, 3, 2, 5):
    yield x

def sorted_gen():
  for x in range(1, 6):
    yield x

def pairs_with_replacement(iterable):
  for a in iterable:
    for b in iterable:
      yield (a, b)

if __name__ == "__main__":
  print(len(list(pairs_with_replacement(sorted_gen())))) # 6
  print(len(list(pairs_with_replacement(unsorted_gen())))) # 4
  print(len(list(pairs_with_replacement(sorted(unsorted_gen()))))) # 25

In hindsight, this makes sense because generators visit any given value at most once. However, given a generator, pairs_with_replacement tries to loop over it twice, which is impossible. This code avoids the bug:

def pairs_with_replacement(iterable):
  iterable = sorted(iterable)
  for a in iterable:
    for b in iterable:
      yield (a, b)

The additional memory usage is necessary for correctness.

TIL that range returns a list, while xrange returns a generator, and therefore xrange is more memory-efficient.

#!/usr/bin/env python

from non_abundant_sums import generate_abundant_nums

import cProfile
from pstats import SortKey

def pairwise_sums(nums, K):
    """
    Given a sequence of numbers like [1, 4, 6], return unique sums like
    [2, 5, 7, 8, 10, 12]. Sums that are greater than N are excluded from the
    result.
    """
    # Ensure that the iterable is sorted. This also avoids a bug if `nums` is
    # a generator, as iterating over it more than once is incorrect.
    nums = sorted(nums)

    sums = set([])
    for r in nums:
        for c in nums:
            # Only consider the lower triangle of the grid. Fine to compare
            # values and not indices because `nums` is sorted and used in both
            # loops.
            if c > r: break

            s = r + c

            # Any other sum on this row will be greater than K
            if s > K: break

            sums.add(s)

    return sums

def sum_of_non_abundant_sums():
    """
    Solution for https://projecteuler.net/problem=23
    """
    # (28123, inf] can all be written as the sum of two abundant numbers. Start
    # by assuming all numbers in [1, 28123] cannot be expressed as the sum of
    # two abundant numbers.
    K = 28123
    ans = K * (K + 1) / 2

    # Generate all numbers that can be expressed as the sum of two abundant
    # numbers. Stop at 28124 because beyond that is a futile exercise.
    abundant_nums = generate_abundant_nums(1, K - 1)
    for n in pairwise_sums(abundant_nums, K):
        ans -= n

    return ans

if __name__ == "__main__":
    cProfile.run("print(sum_of_non_abundant_sums())", sort=SortKey.TIME)

The perf profile shows that the total time in pairwise_sums decreased by \(.915s \approx 27\%\). The additional sort call was not even expensive.

4179871.0
         12183910 function calls in 3.554 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    2.442    2.442    3.552    3.552 non_abundant_sums_deluxe.py:8(pairwise_sums)
 12148815    0.931    0.000    0.931    0.000 {method 'add' of 'set' objects}
    28122    0.172    0.000    0.172    0.000 non_abundant_sums.py:8(sum_of_proper_divisors)
     6966    0.006    0.000    0.178    0.000 non_abundant_sums.py:29(generate_abundant_nums)
        1    0.002    0.002    3.554    3.554 non_abundant_sums_deluxe.py:33(sum_of_non_abundant_sums)
        1    0.001    0.001    0.179    0.179 {built-in method builtins.sorted}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    3.554    3.554 {built-in method builtins.exec}
        1    0.000    0.000    3.554    3.554 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Not the \(50\%\) boost I anticipated, but \(27\%\) is good enough? I wasn’t expecting the code to be an order of magnitude faster.

Learning from Others

has a variation of this problem, where given a list of numbers, print YES if the number can be written as the sum of two abundant numbers, and NO otherwise. Caching abundant_sums_less_than_K is important to avoid repeated work.

if __name__ == "__main__":
  K = 28123
  abundant_sums_less_than_K = pairwise_sums(generate_abundant_nums(1, K - 1), K)

  num_test_cases = int(input().strip())
  for _ in range(num_test_cases):
      N = int(input().strip())
      if N > K or N in abundant_sums_less_than_K: print("YES")
      else: print("NO")

notes that every number greater than \(20,161\) can be expressed as a sum of two abundant numbers. This improves on ’s floor of \(28,123\). With this info, we go down to \(1.833s\) and \(6,282,406\) function calls.

’s Python solution runs in \(0.512s\) (\(-85.59\%\)), and with \(140,563\) function calls (\(-98.85\%\)). The special sauce is in avoiding pairwise_sums:

#!/usr/bin/env python

from non_abundant_sums import generate_abundant_nums

import cProfile
from pstats import SortKey

def sum_of_non_abundant_sums():
    """
    Solution for Solution for https://projecteuler.net/problem=23 inspired by
    Therryka's https://projecteuler.net/thread=23;page=8#411899.
    """
    # (28123, inf] can all be written as the sum of two abundant numbers.
    K = 28123

    abundant_nums = sorted(generate_abundant_nums(1, K - 1))
    abundant_nums_set = set(abundant_nums)

    ans = K * (K + 1) / 2
    for n in range(1, K + 1):
        # Test if there is an `a + b = n` where `a` and `b` are abundant.
        for a in abundant_nums:
            b = n - a

            # Abundant numbers need to be positive. Can break because
            # `abundant_nums` is sorted, and subsequent `b`s will be -ve.
            if b <= 0: break

            # We only need one such `a + b`. No need to search further.
            if b in abundant_nums_set:
                ans -= n
                break

    return ans

if __name__ == "__main__":
    cProfile.run("print(sum_of_non_abundant_sums())", sort=SortKey.TIME)

The perf is comparable to :

4179871.0
         35094 function calls in 0.578 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.395    0.395    0.578    0.578 non_abundant_sums_deluxe_therryka.py:8(sum_of_non_abundant_sums)
    28122    0.176    0.000    0.176    0.000 non_abundant_sums.py:8(sum_of_proper_divisors)
     6966    0.006    0.000    0.182    0.000 non_abundant_sums.py:29(generate_abundant_nums)
        1    0.001    0.001    0.183    0.183 {built-in method builtins.sorted}
        1    0.000    0.000    0.578    0.578 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.578    0.578 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

did not cross my mind at all. Probably because I tunnel-visioned myself into thinking about monotonically increasing grid traversals? That sounded math-like, and given I’m on Project Euler, the more math the better?

References

importlib — The implementation of import — Python 3.11.2 documentation. docs.python.org . Accessed Feb 19, 2023.
Generators - Python Wiki. wiki.python.org . Accessed Feb 20, 2023.
Project Euler #23: Non-abundant sums | HackerRank. www.hackerrank.com . www.hackerrank.com . Accessed Feb 20, 2023.
Abundant Number -- from Wolfram MathWorld. mathworld.wolfram.com . Accessed Feb 20, 2023.
Thread 23 - Project Euler. projecteuler.net . Accessed Feb 20, 2023.