Python for Product Managers

How learning a little Python can go a big way to speeding up your product decisions

Joshua

7 minute read

Python for Product Managers

Product Management can be a challenging role, with demands pulling you in many different ways. So anything we can do to minimise time spent on repetitive tasks or to maximise the value that we get from data should be capitalised to the full.

Python is a popular open-source programming language which is relatively easy to learn. It can help you automate the boring stuff!

In this series of posts, I’ll show a couple of examples of how mastering Python can free up valuable time and provide deeper insights to you as a Product Manager.

Step one - Install Python

Installing Python depends on you Operating System and your requirements.

Anaconda is a distribution which contains Python and many of the most common and popular additional packages that you might need. It also gives you a clear and sane way of adding new packages and keeping stuff up to date. It’s probably the simplest way to get started, and it’s widely used so lots of help out there.

Installation instructions can be found online and are easy to follow.

If you use Anaconda, you’ll want to familiarise yourself with the conda command for package management. Again, there are good instrations available online.

You can search for packages:

conda search <package>

You can install a package:

conda install <package>

and you can update a package:

conda update <package>

Full details here…

Also good but maybe a bit more complicated - install from Python.org

Python.org is the home of the Python programming language. There are downloads available for most Operating Systems and the installation process is still relatively straight forwards.

If your using Python without the Anaconda distribution, then pip is the package manager you’ll use.

Step two - Jupyter Notebook

There are several ways to run a Python program. Most simply, a text file with a .py extension can be saved and then run with the Python interpreter. Python is an interpreted language, so rather than compiling your code to an executable file (e.g. MyApp.exe) the scripts are interpreted at run time. This makes it good for experimenting!

To make the whole process even more immediate, a web-based interactive “notebook” is available through the Jupyter project. This allows you to write and execute Python scripts via your browser in an intuative way which lends itself to nicely documented, repeatable experiments.

If you’ve used Anaconda, you should have Jupyer installed already. If not, installation instructions are available.

To start the browser based notebook, run the following:

jupyter notebook

All being well, a web browser will open up showing you a file / directory browser.

Add a new notebook for one of the Python versions you have available (e.g Python 3 for me in the above screenshot).

Just to prove it’s all working, let’s run some Python code. Type or paste the following into the cell:

print("Hello Product Managers")

Now, whilst the cell is selected, press ctrl-return and you should see the output of the code below.

Notebook keyboard shortcuts

There is, of course, a lovely toolbar and menu structure. But for speed, there are a number of keyboard shortcuts available which make life easier. Here’s a few to get you started:

  • ctrl-enter - Run current cell
  • shift-enter - Run current cell and create a new empty cell below
  • esc - Enter command mode. This is when you can do commands on the cell, rather than entering text into the cell. The left-hand border of the cell is green in edit mode, and blue in command mode.
  • a - New cell above current cell (when in command mode)
  • b - New cell below current cell (when in command mode)

That’s probably enough to get started! A few shortcuts go a long way! There are many more shortcuts to learn though, once you’re up and running.

Step three - learn Python

Now, I’m not best placed to teach you Python and its basic syntax. Many people are better equipped to do that, and there is a wealth of information, courses and tutorials out there which will give you a good grounding.

Some pointers to great resources for learning Python:

Or, you can grab a book!

Step four - Python modules for data analysis

So, I kind of shirked my responsibilities in the last step, didn’t I! Well, to make up for it, assuming you know some basic Python, here’s some stuff that will make it all seem worthwhile. And if you haven’t learned yet, hopefully this will inspire you to get stuck in!

Pandas is a great module that helps you load, transform and analyse data quickly in Python. It’s definately a modeule to explore if, like me, you spend a lot of time looking at data an asking questions.

Specifically, Pandas is a useful tool in your workflow for the following common data analysis tasks:

  • Loading data from CSV files or directly from a database
  • Viewing, filtering and sorting the data
  • Looking at common statistics and aggregations of the data
  • Joining and merging data together
  • Grouping, pivoting and transforming the data
  • Plotting and visualising the data

We’ll work through examples of each of these in the next installment - but to make sure you’re hooked, here’s a little example. Imagine we’ve obtained a tab delimited text file with data about sessions for our SaaS application. Each row represents one session, with columns for:

  • A unique ID
  • Session start time
  • Session end time
  • A user ID

My sample file has several years worth of data and is about 100Mb in size. Let’s have a look what we can do with that!

# Show graphs and charts inline in the notebook
%matplotlib inline

# Import some libraries
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# Set some default plot styles to make things look nice
matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)
plt.style.use('bmh')
# Import our session data from a tab delimited text file, making sure the start and end dates are loaded as dates
df = pd.read_csv('monthly_sessions.csv', sep='\t', parse_dates=['end_datetime', 'start_datetime'])

# Show the top few rows of data
df.head()
idend_datetimestart_datetimeuser_id
002015-11-30 01:49:042015-11-30 01:12:18e3472cfd-e249-43f5-badf-8130c3e93f51
112015-11-30 01:53:502015-11-30 01:11:1469ddcdb5-4635-43c4-94a1-a10e701082cf
222015-11-30 02:05:562015-11-30 01:15:5821dc4ed3-b8b9-4913-8b3b-ed843f78e8af
332015-11-30 04:30:172015-11-30 04:19:022cda4b97-71d3-4f32-8b89-f39a7a41a75f
442015-11-30 06:02:422015-11-30 05:50:51f39984e0-4c08-4e29-aa41-d6dddfdbd7f5
# Let's see how many sessions we've loaded from the file
len(df.index)
1293267
# Let's count sessions per month and plot to see the trend
df.groupby([df.start_datetime.dt.year, df.start_datetime.dt.month]).agg('count').id.plot()
plt.title('Sessions per month')
Text(0.5, 1.0, 'Sessions per month')
# Let's calculate the duration of each session
df['session_duration'] = df.end_datetime - df.start_datetime

# And then look at some stats for session_duration
df.describe()
idsession_duration
count1.293267e+061293267
mean6.466330e+050 days 00:33:29.360544
std3.733342e+050 days 00:16:27.508079
min0.000000e+000 days 00:05:01
25%3.233165e+050 days 00:19:14
50%6.466330e+050 days 00:33:28
75%9.699495e+050 days 00:47:45
max1.293266e+060 days 01:01:58
# Let's look at the number of sessions by day of week
df.start_datetime.dt.weekday.hist(bins=[0,1,2,3,4,5,6,7])
<matplotlib.axes._subplots.AxesSubplot at 0x7feea73a93c8>

Summary

Hopefully that gives you a glimpse into how you can load, manipulate and visualise data.

But I can do that in Excel!

Somebody, somewhere

Now, you could do that in Excel, but where Python comes in super useful is that (depending on your PC specs) it can handle larger files more quickly and that it’s super easy to repeat your analysis and even automate it. Imagine a slightly more complex example of the above analysis pulling data from several databases or files - you can re-run the analysis at the click of a button, obtaining new data and presenting new outputs.

In the next instalment, we’ll look in more detail at some common tasks and how to achieve them with Python & Pandas.

If you’ve got any comments, suggestions or requests then please let me know in the comments.

Read the rest of the series

Follow the full series of posts to master Python!

  • Part 1 : Installing and setting up Python, Pandas and Jupyter
  • Part 2 : Loading and viewing data
  • Part 3 : Grouping, aggregating and summarising data

To be notified of new posts subscribe to the mailing list.



comments powered by Disqus