A pixelated picture of me...

ProductMetrics

Learning about the metrics that drive successful products

Using Python for a Monte Carlo business case forecast

Aplying a Monte Carlo method to business case forecasts using Python

Joshua

6 minutes read

In my last post we looked at a simple business case forecast written in Python. We saw how we could quickly and easily forecast different potential outcomes, based on the underlying assumptions passed in to the forecast.

Forecasting a good and bad outcome is helpful to stress test a business case. But, with a simple deterministic forecast, we are not helping understand the likelihood of any particular outcome.

A Monte Carlo forecast helps provide a view on the likelihood of various outomes by adding an element of randomness into the assumptions and viewing the average outcomes after many simulations. Let’s see how we can adapt our simple forecast and apply the Monte Carlo method to get a better view on our business case.

We still need some assumptions…

Of course, we still need some assumptions. But, we can now think of our assumptions as a distribution of possible outcomes, rather than single fixed expectation.

Keeping with our fictitious biscuit subscription box service “Bicky Boxes” - let’s have a look at how we can generate a distribution of outcomes.

In our simple forecast, we assumed:

  • We can sell 25 normal Bicky Boxes @ £9.99 per month
  • We can sell 10 Choccy Bicky Boxes @ £15 per month
  • A subscriber will probably use the service for 2 years (possibly linked to early death from extreme biscuit consumption)

To run a Monte Carlo simulation however, we want to turn these singular assumptions into a range of possible outcomes. To do this, we have a few options:

  • Make assumptions about the shape of the distribution and perform some random sampling
  • Make some assumptions about the liklihood of certain outcomes and create a bespoke list of outcomes
  • Sample some historic data to understand the distribution of prior results

Let’s start with the number of contracts that we’re going to sell. We assumed 25 per month, but in reality there will be good months and bad months. In the absense of any historic data, let’s assume that our sales will be normally distributed, with a mean of 25. We can use numpys random.normal function to generate a list of sample values:

contracts = [int(x) for x in np.random.normal(20, 5, 200)]

To see what that looks like, let’s generate a quick histogram:

plt.hist(contracts)
A random distribution of of contract lengths

Nice and normal looking! We now have an array of normally distributed sales figures. For our Monte Carlo foreast, we can randomly select from this list each month to add some variability into our results.

The Monte Carlo forecast

The class below implements a crude Monte Carlo forecast, utilising the forecast class that we created last time. In summary, it works like this:

  • The forecast loops for a specified number of runs - the higher the number, the smoother the output
  • For each run, we loop through each month of the forecast and create a number of contracts
  • For each contract, we select values randomly to determin the value, term and other properties
  • Once we have completed all the forecasts, we aggregate the results from each simulation
  • We can then look at various percentile outcomes to understand the likely outcomes
class monte_carlo_forecast():
    
    def __init__(self, start_date, number_of_months):
        self.start_date = start_date # When the foreacst starts
        self.number_of_months = number_of_months # Number of months to forecast
        self.months = pd.date_range('2020-01', periods=60, freq='M')
        
        # Create a dataframe to store the results of all the sumulations
        self.df = pd.DataFrame(index=self.months)
        
        # Store each simulation in this array so they can be explored later if needed
        self.forecasts = []
        
        # These arrays will hold the names of generated columns for convienience later
        self.total_sales_cols = []
        self.forecast_licences_cols = []
        
    def forecast(self, number_of_runs, contracts, payment_frequencies, values, terms, increase_period, increase_percentages, quantiles=[0.1, 0.5, 0.9]):
        # Loop for each simulation we need to run
        for r in range(0, number_of_runs):
            cnum = 0 # A counter to give each contract a unique name
            f = forecast(self.start_date, self.number_of_months) # Generate a new forecast
            
            # Loop through each month
            for m in range(0, self.number_of_months):
                
                # Loop through each contract we need to create this month
                for c in range(0, random.choice(contracts)):
                    
                    # Create a contract with random properties from the passed in distributions
                    ct = contract(m, np.random.choice(payment_frequencies), np.random.choice(values), np.random.choice(terms), "contract" + str(cnum), increase_period, np.random.choice(increase_percentages))
                    f.add_contract(ct)
                    cnum += 1
            
            self.forecasts.append(f)
            
            # Add a columns to record the result of the simulation
            self.df['total_sales-run-' + str(r)] = f.result().total_sales
            self.total_sales_cols.append('total_sales-run-' + str(r))
            
            self.df['forecast_licences-run-' + str(r)] = f.result().forecast_licences
            self.forecast_licences_cols.append('forecast_licences-run-' + str(r))
            
        # Calculate percentile outcomes for total sales and forecasted licences each period
        for q in quantiles:
            self.df['total_sales_' + str(q) + '_quantile'] = self.df[self.total_sales_cols].quantile(q, axis=1)
            self.df['forecast_licences_' + str(q) + '_quantile'] = self.df[self.foreca

Now, we can run our forecast! I created a set of assumptions as follows:

contracts = [int(x) for x in np.random.normal(20, 5, 200)]

terms = [int(x) for x in np.random.normal(12, 4, 200)]

values = [9.99, 9.99, 9.99, 15]

increase_periods = [12]

increase_percentages = [0]

As you can see, some are more random than others! To run the forecast, we simply call the forecast method of our monte carlo class.

mc = monte_carlo_forecast('2020-01', 60)

mc.forecast(1000, contracts=contracts,  payment_frequencies=[1], values=values, terms=terms, increase_period=12, increase_percentages=[0])

Now, depending on the assumptions you pass in and the number of runs, it may take a while to churn through the simulations. I’m open to suggestions to improve the speed!

After 1,000 runs, we can look at the output of a few of the runs like this:

mc.df[mc.total_sales_cols[:100]].plot(legend=False)

plt.title("Sample licence revenue forecast simulations")

ax.set_xlabel("Date")
ax.set_ylabel("Revenue")

fmt = '£{x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)
Three year licence revenue forecast for Bicky Boxes

As you can see, the element of randomness has caused each run to be a little bit different. So, we now have 1,000 different answers to how well the Bicky Box business model will do - that is great, but not very helpful.

To get a more helpful answer, we can look at the quantile outputs - the aggregated outcomes of all of the runs. Each month, the outcome of all the simulations are sorted and then specific quantiles are extracted. By default the code looks at the 10th, 50th and 90th quantile.

So, these now give us a sense of liklihood for a particular outcome. We can say that there is a 10% chance of seeing results worse than the 10th quantile, or conversely that there is a 90% chance of exceeding it. The 50th quantile shows the average outcome. And the 90th quantile shows the upper boundary of what we could reasonably expect.

We can plot the quantiles as follows, with a few of the simulations behind:

ax = mc.df[mc.total_sales_cols[:200]].plot(legend=False, color='#cccccc',linewidth=0.25, alpha=1)

mc.df['total_sales_0.5_quantile'].plot(ax=ax, linewidth=2)
mc.df['total_sales_0.1_quantile'].plot(ax=ax, linewidth=2)
mc.df['total_sales_0.9_quantile'].plot(ax=ax, linewidth=2)

plt.title("Licence revenue forecast")

ax.set_xlabel("Date")
ax.set_ylabel("Revenue")


fmt = '£{x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick) 
Three year licence revenue forecast for Bicky Boxes

So, there you go. By running lots of simulations each with an element of randomness set arround our assumptions we’ve managed to generate some good expectations for how the Bicky Box sales may pan out.

If you’ve got any suggestions or questions, please leave a comment!

You can find the complete code here.

comments powered by Disqus

Python for Product Managers

Check out all the posts in the Python for Product Managers series:

Get new posts in your inbox

Want to keep up with new posts? Subscribe to our mailing list to get new post notification in your inbox. No spam!

Subscribe

Recent posts

See more

About

Hey, I'm Joshua, a Product Manager and data fan. This is my blog and a place to post random musings and tutorials.