Nov 20 Data Science Talklet: Incorporating Text Data into Your Feature Set

As promised, here are the slides and notes from my DSDC talklet on strategies for incorporating text data into the feature set of a predictive model. Slides Notes github Thanks to Harlan for asking, and to Dan and David for...

Visualizing Baltimore with R and ggplot2: Crime Data

Visualizing Baltimore with R and ggplot2: Crime Data

The advent of municipal open data initiatives has been both a blessing and curse for my particular brand of data nerd. On one hand, it has opened up the possibility of developing deep and useful knowledge about the places we...

Data Science Presentation Slides

Thanks to everyone for coming out. Slides can be found here. Doesn't work well in mobile and touchscreen browsers. Code for simulation can be found here. Code for polling data example can be found here. I learned a lot from...

Topic Modeling 1: Simulated LDA Corpus

Because I am self-taught in many of the areas of computer science and more advanced statistics and probability theory I am most interested in, and because I have a deep aversion both to looking foolish and being full of it...

A Monty Hall Monte Carlo, Part 1? (Oh God)

A Monty Hall Monte Carlo, Part 1? (Oh God)

While I dig into conjugacy and the calculation of Bayesian credibility intervals, I figured it'd be good to put some of my other little rabbit holes up here on the off chance they're interesting to someone. For some reason I...