Episodes
Monday Nov 26, 2018
Episode 5: Dealing with Failure
Monday Nov 26, 2018
Monday Nov 26, 2018
Failure. It happens. It happens a lot. And we can't avoid it. But we can learn to anticipate some of it, and to better deal with it as it happens to us in our data science projects. In this episode, I discuss how failure happens in projects, how to avoid some of it, how do deal with some of it, examples of where I've really failed, and how to cope with it all. Our whole discipline is about trying models and learning to make better models from failed ones, so we should be able to deal with it at a meta level...
Friday Nov 02, 2018
Friday Nov 02, 2018
Seeing how to solve a problem is one thing. Solving that problem in a way that doesn't hamstring future you takes a lot of work. In this episode, we talk about technical debt: what it is, what it looks like to a data scientist, and some resources to help avoid it.
People I reference to follow on Twitter:
- Jenny Bryan (@JennyBryan)
- Mara Averick (@dataandme)
- Julia Evans (@b0rk)
- John D Cook (@JohnDCook)
Saturday Oct 27, 2018
Episode 3: Knowing your business: How not to solve the wrong problem
Saturday Oct 27, 2018
Saturday Oct 27, 2018
In the first two episodes, I talked about knowing your data, then knowing your tools. In this episode, I talk about knowing your business. I give a story about where I didn't know my business and it burned me. How can knowing your business help you up even your data science skills? Listen and find out!
Saturday Oct 27, 2018
Episode 2: Data Wrangling: Why you gotta do what you gotta do
Saturday Oct 27, 2018
Saturday Oct 27, 2018
The common complaint about data science is that 90% of your time is spent data wrangling. In this episode, I talk about some history that leads to this current state of data science work, and why you should embrace this. I also give some resources that will help you with your data wrangling at the raw level.
R Packages and Tools mentioned in this episode:
R:
Package | Description |
lubridate | Handing dates, datetimes, intervals, durations |
readr | Reading in CSV and related textual files |
readxl | Reading in Excel files |
jsonlite | Reading, writing and manipulating JSON structures |
httr | Reading HTML and extracting parts programatically |
dplyr + purr | Simple grammar for common data manipulations |
Command line tools:
Utility | Description |
head | Show first few lines of a text file |
less [-S] | Pager to make sure data you look at doesn't scroll off the screen |
wc | Count lines, words, and characters in a file |
csvlook | Python package that helps format and manipulate CSV files from command line |
Saturday Oct 27, 2018
Episode 1: Knowing Your Data Like Scottie Knows the Enterprise
Saturday Oct 27, 2018
Saturday Oct 27, 2018
In this introductory episode, I introduce myself (Matt Pettis), what I want to do with the podcast, and some examples and stories regarding knowing your data. I talk about example of where not knowing my data burned me, how knowing it saved me, and some simple tactics you can use to familiarize yourself with the data of your project.
I originally wanted to get ABBA to modify one of their songs to "Knowing me, knowing my data," but I have no money to do that.