How to count each digit in a range of integers?

Posted by Carlos Gutiérrez on Stack Overflow See other posts from Stack Overflow or by Carlos Gutiérrez
Published on 2010-01-13T19:42:42Z Indexed on 2010/04/11 2:23 UTC
Read the original article Hit count: 590

Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:

  • 1 to 100
  • 51 to 300
  • 1 to 2,000 with zeros to the left

The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.

I wonder if there is a better way to solve this, without having to loop through the entire integers range.

Solutions in any language or pseudocode are welcome.


Edit:

Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten. Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.

A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:

  • For digits 1 to 9, n*10(n-1) pieces are needed
  • For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed
  • For digit 0, if using leading zeros, n*10n-1 - n are needed

The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).

For example, if n = 6, range is 1 to 999,999:

  • For digits 1 to 9 we need 6*105 = 600,000 of each one
  • For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889
  • For digit 0, with leading zeros, we need 6*105 – 6 = 599,994

These numbers can be checked using High-Performance Mark results.

Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:

integer First,Last //First and last number in the range
integer Number     //Current number in the loop
integer Power      //Power is the n in 10^n in the formulas
integer Nines      //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix     //First digits in a number. For 14,200, prefix is 142
array 0..9  Digits //Will hold the count for all the digits

FOR Number = First TO Last
  CALL TallyDigitsForOneNumber WITH Number,1  //Tally the count of each digit 
                                              //in the number, increment by 1
  //Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
  Power = Zeros at the end of number //For 1,000, Power = 3
  IF Power > 0                       //The number ends in 0 00 000 etc 
    Nines = 10^Power-1                 //Nines = 10^3 - 1 = 1000 - 1 = 999
    IF Number+Nines <= Last            //If 1,000+999 < 8,000, add a full set
      Digits[0-9] += Power*10^(Power-1)  //Add 3*10^(3-1) = 300 to digits 0 to 9
      Digits[0]   -= -Power              //Adjust digit 0 (leading zeros formula)
      Prefix = First digits of Number    //For 1000, prefix is 1
      CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each 
                                                     //digit in prefix,
                                                     //increment by 999
      Number += Nines                    //Increment the loop counter 999 cycles
    ENDIF
  ENDIF 
  //End of optimization
ENDFOR  

SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
  REPEAT
    Digits [ Number % 10 ] += Count
    Number = Number / 10
  UNTIL Number = 0

For example, for range 786 to 3,021, the counter will be incremented:

  • By 1 from 786 to 790 (5 cycles)
  • By 9 from 790 to 799 (1 cycle)
  • By 1 from 799 to 800
  • By 99 from 800 to 899
  • By 1 from 899 to 900
  • By 99 from 900 to 999
  • By 1 from 999 to 1000
  • By 999 from 1000 to 1999
  • By 1 from 1999 to 2000
  • By 999 from 2000 to 2999
  • By 1 from 2999 to 3000
  • By 1 from 3000 to 3010 (10 cycles)
  • By 9 from 3010 to 3019 (1 cycle)
  • By 1 from 3019 to 3021 (2 cycles)

Total: 28 cycles Without optimization: 2,235 cycles

Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:

If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.

Benchmark and Source code

I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:

Original             104.78 seconds
With %10              83.66
With Powers of Ten     0.07

A screenshot of the benchmark application:
alt text

If you would like to see the full source code or run the benchmark, use these links:

Accepted answer

noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.

Aaronaught solution seems to be correct, but the code is just too complex for my taste.

I accepted strainer’s answer, because his line of thought guided me to develop this new solution.

© Stack Overflow or respective owner

Related posts about language-agnostic

Related posts about count