A pixelated picture of me...

ProductMetrics

Learning about the metrics that drive successful products

Using Python to forecast a simple business case

Forecasting potential sales revenues using Python

Joshua

9 minutes read

Every now and then as a Product Manager you get to do something amazing. You get to create something, to start a new product from scratch and turn a little idea into something big. Part of the work required when establishing if the idea has legs is to work up a business case, including a forecast of expected sales revenue - put simply, you want to know if the product will make money, how much and how fast.

Traditionally, here enters the multi-sheet sprawling Excel spreadsheet with row upon row of assumptions and numbers. As an alternative, using all the Python we’ve learned, it is of course possible to create a business case forecast using Python - so today, let’s have a look at what we can do.

A simple forecast

As the basis for our forecast, we need some assumptions. So for this example, I’d like to introduce you to a new fictional subscription box service - “Bicky Box - Biscuits, in a Box delivered straight to your door”. Yum.

For the forecast, we’re going to assume:

  • We can sell 25 normal Bicky Boxes @ £9.99 per month
  • We can sell 10 Choccy Bicky Boxes @ £15 per month
  • A subscriber will probably use the service for 2 years (possibly linked to early death from extreme biscuit consumption)

To keep things simple, we’ll just forecast the incoming licence revenue, rather than any costs and profit. But the approach we’ll look at could easily be extended to do more.

We can use a Pandas DataFrame to help track the monthly sales revenue coming in from each contract that we sell.

A datastructure for a contract

To represent a contract, we’ll have a simple class holding only the information about the contract - hopefully fairly self explanatory. We’ll then make a forecast based on contracts.

class contract:
    
  def __init__(self, start_month, payment_frequency, value, term, contract_name, increase_period=1, increase_percentage=0):
    self.start_month = start_month # The month (zero based int) at which the contract starts 
    self.payment_frequency = payment_frequency # The frequency of payments (e.g. 1 for monthly, 3 for quarterly, 12 for annually)
    self.value = value # The money recieved from the contract
    self.term = term # The duration of the contract in months
    self.contract_name = contract_name # An identifier for the contract
    self.increase_period = increase_period # Optional frequency of price increases (e.g. 12 for an annual inflation)
    self.increase_percentage = increase_percentage # Optional percentage increase (e.g. 2.5 for a 2.5% price increase)

Using the above, we can start to model a potential contract for our forecast.

c1 = contract(0, 1, 9.99, 24, "Bicky Box")

That gives us a contract for a single Bicky Box subscriber paying £9.99 each month for 2 years. Now, we need a forecast to run!

Putting the forecast together

Again, we’ll use a class to represent the forecast that we’re running. When creating a new forecast, we will specify the starting month (e.g. ‘2020-01’ for Jan 2020) and the number of months that we wish to forecast (e.g. 60 for a three year forecast).

The forecast class holds a DataFrame within which we’ll add the revenue from contracts and calculate the running total. We’ll create the DataFrame with a time series index, effectively one row per month for the duration of the forecast. We’ll hold on to the list of months, as that will be useful later.

– Note: this forecast assumes we’re forecasting on a monthly basis. You could adapt it to work on a daily, quarterly or annual basis easily enough by passing in a different frequency to the freq parameter of the pd.date_range function - see here for more info.

class forecast:
    
  def __init__(self, start_date, number_of_months):
      self.start_date = start_date
      self.number_of_months = number_of_months
      self.months = pd.date_range('2020-01', periods=60, freq='M')
      
      self.contracts = []
      
      self.df = pd.DataFrame(index=self.months)
      self.df['forecast_licences'] = np.zeros(number_of_months)
      self.df['live_contracts'] = np.zeros(number_of_months)
      self.df['total_sales'] = np.zeros(number_of_months)

Within the results dataframe, we’ll have some calculated columns:

  • forecast_licences - the value of contracts paid in each month
  • live_contracts - the number of live / active contracts each month
  • total_sales - the total revenue earned from all contracts up to and including each month

We’ll look at how we calculate these in a bit - they’re all initalised with zeros. First, we need to add some contracts!

Adding a contract to the forecast

Now we have a contract class, we can represent the money earned from it. To do this, we’ll add a new method to our forecast class add_contract. To hold the earned revenue from each contract we will add a new column to the forecast dataframe and then set the value earned in each month (otherwise, the value is £0).

def add_contract(self, contract):
    # Adds a new contract to the forecast
    
    self.df[contract.contract_name] = np.zeros(self.number_of_months)
    self.contracts.append(contract)
    
    for m in range(contract.start_month, contract.start_month + contract.term, contract.payment_frequency):
        try:
            increase = (contract.increase_percentage * int(m / contract.increase_period) / 100) * contract.value 
            self.df.loc[self.months[m]][contract.contract_name] = contract.value + increase
            self.df.loc[self.months[m]]['live_contracts'] += 1
        except IndexError:
            pass

The above code also deals with the potential percentage based increases. We’ve added the value of the contract into a new column, so eventually we’ll end up with a wide dataframe with a column per contract. That’s fine (and makes debugging and visualising whats going on easier) - but to see the result of the forecast we still need to calculate those result columns. We’ll do this in another method so that we can add several contracts to the forecast and calculate the results once.

def result(self):
  # Ensure all calculations are done and return the dataframe
  
  # First, sum all licence columns to get the total earned in that month from all live contracts
  self.df['forecast_licences'] = self.df[[c.contract_name for c in self.contracts]].sum(axis=1)
  
  # Secondly, calculate a cumulative total licence sales
  self.df['total_sales'] = self.df.forecast_licences.cumsum()
  
  return self.df

The result method returns the forecasts dataframe. Now, we can see some output at last!

# Create a new three year forecast
f = forecast('2020-01',60)

# Create a contract
c1 = contract(0, 1, 9.99, 24, "Bicky Box")

# Add the contract
f.add_contract(c1)

# View the first few lines of result
f.result().head()
forecast_licenceslive_contractstotal_salesBicky Box
2020-01-319.991.09.999.99
2020-02-299.991.019.989.99
2020-03-319.991.029.979.99
2020-04-309.991.039.969.99
2020-05-319.991.049.959.99

So, you can see the shape of the forecast now - one row per month, one column per contract with some extra columns for results. After 24 rows, the Bicky Box column drops to zero as the contract ends. Looking at the end of the forecast shows this:

forecast_licenceslive_contractstotal_salesBicky Box
2024-08-310.00.0239.760.0
2024-09-300.00.0239.760.0
2024-10-310.00.0239.760.0
2024-11-300.00.0239.760.0
2024-12-310.00.0239.760.0

Generating contracts as part of the forecast

So, we’ve seen how we can model a single forecast. We can model two (or three, or more) in the same way:

f = forecast('2020-01',60)

c1 = contract(0, 1, 9.99, 24, "Bicky Box")
c2 = contract(2, 1, 15, 24, 'Choccy Bicky Box')

f.add_contract(c1)
f.add_contract(c2)

f.result().head()
forecast_licenceslive_contractstotal_salesBicky BoxChoccy Bicky Box
2020-01-319.991.09.999.990.0
2020-02-299.991.019.989.990.0
2020-03-3124.992.044.979.9915.0
2020-04-3024.992.069.969.9915.0
2020-05-3124.992.094.959.9915.0

But, that’s not a very efficient way of modelling our assumption of selling 25 Bicky Boxes a month! So, let’s add another method to the forecast to generate the contracts for us.

def project(self, start_month, contract_frequency, payment_frequency, number_of_contracts, value, term, name, increase_period=1, increase_percentage=0):
  
  for m in range(start_month, self.number_of_months - start_month, contract_frequency):
    c = contract(m, payment_frequency, value, term, name + str(m), increase_period, increase_percentage)
    self.add_contract(c)

So, by specifying the frequency of new contracts, the number of new contracts to sell and then all the details of the contract, we can simply loop through the months and create the contracts on the fly. To avoid column naming clashes, we append the contract number to the name given (although, this isn’t foolproof, should probably do more to ensure a unique name).

Now, we can finally model our base assumptions in just a few lines.

f = forecast('2020-01',60)

f.project(0, 1, 1, 25, 9.99, 24, 'Bicky Box')
f.project(0, 1, 1, 10, 15, 24, 'Cboccy Bicky Box')

f.result().head()
forecast_licenceslive_contractstotal_salesCboccy Bicky Box57Cboccy Bicky Box58Cboccy Bicky Box59
2020-01-3124.992.024.990.00.00.0
2020-02-2949.984.074.970.00.00.0
2020-03-3174.976.0149.940.00.00.0
2020-04-3099.968.0249.900.00.00.0
2020-05-31124.9510.0374.850.00.00.0

Of course, a bunch of columns missing there, but you get the idea.

Visualising the result

Now, as the result of the forecast is a standard Pandas DataFrame, it’s easy to do a quick plot to see the forecast output. Here’s a plot of the total sales revenue over the three year forecast.

ax = f.result().total_sales.plot()

fmt = '£{x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick) 

plt.title('Bicky Box Licence Revenue Forecast - Three Years')
Three year licence revenue forecast for Bicky Boxes

Some tasty revenues there!

Modelling alternative scenarios

Now, hopefully you can see that we’ve made something useful and powerful that gives you an easy way of quickly modelling different licence structures and testing out your assumptions. When putting a business case together, it’s always prudent to think “what if things don’t go so well” and do a more pessimistic forecast. For us, that’s easy! Let’s see what would happen if we can’t charge as much, don’t sell as many boxes and subscribers only stick around for 12 months.

# Our main assumptions        
good_forecast = forecast('2020-01',60)

good_forecast.project(0, 1, 1, 25, 9.99, 24, 'Bicky Box')
good_forecast.project(0, 1, 1, 10, 15, 24, 'Cboccy Bicky Box')


# What if things don't go so well?
bad_forecast = forecast('2020-01',60)

bad_forecast.project(0, 1, 1, 15, 7.50, 12, 'Bicky Box')
bad_forecast.project(0, 1, 1, 5, 12.50, 12, 'Cboccy Bicky Box')

# Plot the result on the same plot
ax = good_forecast.result().total_sales.plot()

bad_forecast.result().total_sales.plot(ax=ax)

fmt = '£{x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick) 

plt.title('Bicky Box Licence Revenue Forecast - Three Years inc. pessamistic outcome')
Three year licence revenue forecast with a more pessimistic view

So, whilst the biscuits might not be flying off the shelves, at least we can quickly ascertain what the lower sales volumes would mean for us. And, you can churn out these alternative scenarios by changing just one line of code, hopefully much clearer than tweaking cell after cell in a big spreadsheet.

Of course, if you need a spreadsheet you can have one.

good_forecast.result().to_excel('Good forecast.xlsx')

Next steps

I’ve put the complete code into a gist here if you’re interested. Feel free to comment and let me know if you think anything could be done better, or any other ideas you have for improvements.

Next up, we’ll look at a better way of coming up with some pessimistic (and optimistic) results by applying a Monte Carlo method to the forecast to see what a range of outcomes may look like.

Read part two

comments powered by Disqus

Python for Product Managers

Check out all the posts in the Python for Product Managers series:

Get new posts in your inbox

Want to keep up with new posts? Subscribe to our mailing list to get new post notification in your inbox. No spam!

Subscribe

Recent posts

See more

About

Hey, I'm Joshua, a Product Manager and data fan. This is my blog and a place to post random musings and tutorials.