British MP Voting Similarity Using Neo4J Graph Database

Using Neo4j Graph Database Similarity Algorithms to Look at MP Voting Records

Joshua

8 minute read

British MP Voting Similarity Using Neo4J Graph Database

Finding interesting data to practice different data analysis and processing approaches is always fun. Recently in the UK, there has been a lot of political shenanigans going on as the country prepares (or not) to leave the European Union (or not). Anyway, I’ll try and keep this about the data and not the politics! MP voting records are (thankfully) a matter of public record, so detailed data is available about how each Member of Parliament (MP) has voted in each bill put to the house (in what is called a division).

You can download voting data from the The Public Whip project who collect the data and make it accessible in a machine readable and consistent format.

As a mildly interesting topic, let’s explore how we can use a Graph Database to look at the similarity of different MP’s voting records.

Getting set up

For this project, we’re going to use a few bits of software:

  • Neo4J - an open source graph database
  • Python - programming language to get stuff into our graph database
  • Py2Neo - Python library for connecting to a Neo4J database pip install py2neo
  • Pandas - Python data processing library pip install pandas

So, to follow along, go and install all of that!

I’ve based my analysis on the votes since 2019, for which you will need two files from The Public Whip: - Details of the MPs and vote data values from 2017 onwards - Details of each division in the House of Commons from 2017 onwards

Both files are tab delimited and so easy to load into a Pandas dataframe.

What is a graph database anyway?

Databases come in several flavours. First up we have the relational database - tables of data linked together with primary or foreign keys keeping the data consistent and ensuring integrity. With a relational database, you use SQL to query the data and get it out in whatever shape you want.

A graph database is different. Rather than modelling data as tables, we’re modelling data as a graph. Now, when I say graph, think of a series of interconnected nodes, rather than bar charts or pie charts.

Graph databases are a great choice for modelling networks and relationships between different entities. And once you have your data represented as a graph, there are many algorithms available for analysis - finding the shortest path between nodes, the similarity of nodes or making recommendations based on the connections.

Neo4J is one of the most popular graph databases available. It’s open source and free to download and try. Within Neo4J you can build up nodes with various properties, and the connections between the nodes. Even the connections (edges) can have properties.

Loading data using Python

Loading the data is easy using Pandas. The text files are tab delimited, so read_csv(filename, sep='\t') will load it up easily.

import pandas as pd

#Load MP data into a dataframe. Skip the first 19 rows to get to the data...
mp_df = pd.read_csv('votematrix-2017.txt', sep='\t', skiprows=19)
mp_df.head()
mpidfirstnamesurnamepartyPublicWhip URL
041707DianeAbbottLabhttps://www.publicwhip.org.uk/mp.php?mpid=41707
141713DebbieAbrahamsLabhttps://www.publicwhip.org.uk/mp.php?mpid=41713
241699NigelAdamsConhttps://www.publicwhip.org.uk/mp.php?mpid=41699
341725BimAfolamiConhttps://www.publicwhip.org.uk/mp.php?mpid=41725
441732AdamAfriyieConhttps://www.publicwhip.org.uk/mp.php?mpid=41732
# Load the voting data into a dataframe...
df = pd.read_csv('votematrix-2017.dat', sep='\t', parse_dates=['date'])
df.head()
rowiddatevotenoBillmpid41371mpid41372mpid41373...
0341032019-09-26446Adjournment (Conference)444...
1341022019-09-09445Early Parliamentary General Election (No. 2)-9-9-9...
2341012019-09-09444Prorogation (Disclosure of Communications)222...
3340812019-09-04443Deferred Divisions - Early Parliamentary Gener...-9-9-9...
4340802019-09-04442European Union (Withdrawal) (No. 6) Bill - Dut...222...

Modelling MPs, Parties and Votes

Now we have our data loaded, we need to think about how we’re going to represent it in the graph. There are three main entites we have:

  • A Member of Parliment (MP)
  • The Party they belong to
  • The Bills they have voted on

So, within our graph we will have the following nodes and relationships:

To load the data into the database, we use the following code to loop through the various dataframes and create the nodes and relationships.

from py2neo.data import Node, Relationship
from py2neo import Graph

# Connect to the graph database and start a transaction
g = Graph(auth=("username", "password"))
tx = g.begin()

parties = {}
mps = {}
bills = {}

# Create the party nodes and remember them
for party in mp_df.party.unique():
    parties[party] = Node("Party", name=party)
    tx.create(parties[party])

# Create the bill nodes and remember them
for i, r in df.iterrows():
    bills[r['voteno']] = Node("Bill", voteno=r["voteno"], title=r['Bill'])
    tx.create(bills[r['voteno']])
    
# Create the MPs and add relationships to the party and bills
for i, r in mp_df.iterrows():
    mps[r['mpid']] = Node("MP", name=r['firstname'] + " " + r['surname'])
    
    party_membership = Relationship(mps[r['mpid']], "MEMBER_OF", parties[r['party']])
    
    tx.create(mps[r['mpid']])
    tx.create(party_membership)
    
    # Only looking at votes in 2019
    for vi, vr in df[df['date'].dt.year == 2019].iterrows():
        
        mp_vote = vr['mpid' + str(r['mpid'])]
        
        # Don't add a relationship for missing votes and simplify the votes
        # Vote data is detailed in the votematrix-2017.txt file:
        #   missing: -9
        #   tellaye: 1
        #   aye: 2
        #   both: 3
        #   no: 4
        #   tellno: 5

        if mp_vote != -9:
            mp_vote_simple = 0
            if mp_vote in (1,2):
                mp_vote_simple = 1
            elif mp_vote in (4,5):
                mp_vote_simple = -1
            else:
                mp_vote_simple = 0
            vote = Relationship(mps[r['mpid']], "VOTED", bills[vr['voteno']], vote=mp_vote_simple)
            tx.create(vote)
        else:
            vote = Relationship(mps[r['mpid']], "ABSTAINED", bills[vr['voteno']])
            tx.create(vote)

    
tx.commit()

Processing that all took a few minutes on my little old laptop. But once we’ve loaded the data, we can explore!

Exploring the data

Neo4J has a built in web interface which allows you to run queries and visualise the results. To query the database, we use a Neo4J specific query language called Cypher. If you know SQL, you’ll see how it has some common aspects, but it’s super powered for working with graphs.

So, first off we can see all of the of MPs:

MATCH (mp1:MP)
RETURN mp1

But, that’s not really showing the amazing graph we’ve created - so let’s look at MPs and their party. For this, we need a Cypher query that looks not just at the nodes but the relationship between the nodes. Remember, when we loaded the data we named the party membership relationship “MEMBER_OF”. Cypher has a very nice syntax for this which describes the relationship between the nodes:

MATCH (mp1:MP)-[:MEMBER_OF]->(party:Party)
RETURN mp1, party

So, we’re looking for an MP (which we’ve called mp1) who is a MEMBER_OF a Party (which we’ve called party). We can filter which MP we want to look at, so let’s take a look at labour leader Jeremy Corbyn, his party and his votes:

MATCH (mp1:MP {name: 'Jeremy Corbyn'})-[voted:VOTED]->(bill)
RETURN mp1, bill

Similarity algorithms

Now we’ve seen how to look at an MPs voting record, we can think about how we might find similar MPs based on their voting record. Several experimental similarity algorithms are available as part of Neo4J, although they need to be installed separately. Once installed, you can use them as part of a Cypher query.

For this exercise, we’ll use the Pearson Similarity algorithm as a measure of the similarity in voting patterns for the MPs. The output is a score from -1 (not at all similar) to 1 (identical).

Similarity algorithms are useful for many reasons such as making recommendations or suggestions based on prior data. For example, once we have the similarity function working, we could collect votes from an individual and then make a recommendation as to which MP or Party best matches their views.

The Neo4J docs give a good example of using the Pearson similarity algorithm. Here’s a query that will return the top 10 MPs ranked by similarity to Jeremy Corbyn who are in a different party - essentially, the non-Labour MP who’s votes most closely match Jeremy’s:

MATCH (mp1:MP {name: 'Jeremy Corbyn'})-[voted:VOTED]->(bill)
MATCH (mp1:MP)-[:MEMBER_OF]->(party1)
WITH mp1, party1, algo.similarity.asVector(bill, voted.vote) AS mp1Vector
MATCH (mp2:MP)-[voted:VOTED]->(bill)
MATCH (mp2:MP)-[:MEMBER_OF]->(party2)
WHERE party2 <> party1
WITH mp1, mp2, party1, party2, mp1Vector, algo.similarity.asVector(bill, voted.vote) AS mp2Vector
RETURN mp1.name AS from,
	   party1.name AS party,
       mp2.name AS to,
       party2.name AS party2,
       algo.similarity.pearson(mp1Vector, mp2Vector, {vectorType: "maps"}) AS similarity
ORDER BY similarity DESC
LIMIT 10

This finds Jeremy and his votes, finds his Party, then finds another MP and their votes (where the MP isn’t in the same party as Jeremy) and then passes it into the Pearson similarity algorithm. Here are the results:

frompartytoparty2similarity
Jeremy CorbynLabHannah BardellSNP1.0
Jeremy CorbynLabPete WishartSNP0.9999940177165727
Jeremy CorbynLabDrew HendrySNP0.9999885547994686
Jeremy CorbynLabTommy SheppardSNP0.9999659053208858
Jeremy CorbynLabAngus MacNeilSNP0.9999628411763035
Jeremy CorbynLabLayla MoranLDem0.9999579152438386
Jeremy CorbynLabKirsty BlackmanSNP0.9999407406601323
Jeremy CorbynLabDavid LindenSNP0.9999279802089858
Jeremy CorbynLabAlan BrownSNP0.9998816973817485
Jeremy CorbynLabRonnie CowanSNP0.9998816568329648

And so, the winner is Hannah Bardell, SNP MP for Livingston. To eliminate any possible political bias, heres the same query but showing MPs similar to Prime Minister Boris Johnson:

frompartytoparty2similarity
Boris JohnsonConCaroline NokesIndependent0.9985309761072293
Boris JohnsonConDavid SimpsonDUP0.9833497272283307
Boris JohnsonConCharlie ElphickeIndependent0.9801722292108446
Boris JohnsonConIan Paisley JnrDUP0.9800244149755563
Boris JohnsonConGavin RobinsonDUP0.9677928029570132
Boris JohnsonConPaul GirvanDUP0.9676697712131352
Boris JohnsonConGregory CampbellDUP0.9674505226183879
Boris JohnsonConJeffrey M. DonaldsonDUP0.9670389671903977
Boris JohnsonConSammy WilsonDUP0.9639440612151435
Boris JohnsonConEmma Little PengellyDUP0.9635267665387321

Summary

Hopefully that’s been a useful overview of how to use Neo4J to compare similarity for structured data sets. Hopefully you can also think of some other interesting ways in which you could apply this algorithm! If you do, let me know!



comments powered by Disqus