Chief Scientist - RStudio.
When you’re 15 and you get so frustrated with the chaotic state of your mother’s hand written recipes that you build a website to catalogue them properly it probably offers a good clue about where your future might be leading.
For Hadley Wickham, it was the start of a journey that has taken him into the very heart of data science; though just a decade ago the term had barely been coined.
An outlier at school taking statistics, calculus, chemistry, and biology papers, the original plan was to embark on a medical career with a view to studying genetic engineering.
Gaining a Bachelor of Human Biology with First Class Honours, Hadley is the first to admit that one paper in his degree had an unlikely impact on him.
“One class that I thought at the time was utterly useless was Communications and Active Listening. I just couldn’t see the point in it and felt it was a complete waste of time. As it turned out, it’s probably the course that benefited me the most in terms of the way it taught me to think about how to communicate with others.”
Embarking on a second degree three years later, this time a Bachelor of Science in Statistics and Computer Science, once again gaining First Class Honours, Hadley admits the degree opened up a whole new world of possibilities.
“While I did the absolute minimum to get by, I began to realise how data was going to play an increasingly important role in the future of decision making; though few others at the time seemed to be joining the dots together in the same way I was.”
Going on to complete his doctorate, he says the pieces finally started to come together thanks to his two supervisors, Di Cook and Heike Hofmann.
“They were very good at keeping me focused and got me excited about the programme I was embarking on. I started reading more widely and experimenting with code.”
Joining software developers RStudio in 2013 as Chief Scientist, Hadley’s most well-known work is ggplot2, an R product for producing statistical graphics that enables users to turn data into visualisation and has been downloaded almost 5 million times by users.
But given the huge amount of data that is being collected these days how much of it is actually useful?
“Much of it is bad or even useless. Most businesses know that data is valuable but often they’re not sure why. This is the really big challenge for data science to help users of data make more informed decisions. But one thing’s for certain, if you’re not making the best use of your data you can be sure your competitors will.
New Zealand’s leadership in livestock improvement Hadley says is a good example of the effectiveness of proactive data management.
“NZ has been at the forefront of livestock improvement globally which has allowed us to gain the competitive advantage we have as a dairy exporting nation. There are many other areas where we could be leading the way if we made better use of data.”
Another emerging area is data journalism. The ability for data analysts and journalists to work together to exploit data has the potential to create a whole new type of journalism.
“ProPublica, an American non-profit organization based in New York that produces investigative journalism in the public interest, is doing a fantastic job in this area.”
An active promoter of open source software development, Hadley says making software tools open to everyone is very important to him.
“It’s the essence of what allows software to develop. I get excited when I see my work being translated into Spanish for instance.”
And a final piece of advice for those wanting to follow in his footsteps...
“Don’t limit yourself to just learning in class. Otherwise you end up like everyone else. Be an outlier.”