The Training Set

Data analysis coming at you live from the train between Baltimore and DC.
  • A Tour of the Oldest Vacants in Baltimore

    April 7, 2016

    Vacant buildings in Baltimore have been in the news here lately, first with Governor Hogan earmarking some cash to start knocking some of these things down, and then this past week with wind toppling some of the structures and in one case killing a man. It came to light that some of these things are so unstable that the city inspects them every 10 days. The Brew even pointed out that these buildings are too close for comfort to schools, a point I made quite a while ago though not necessarily because of wind. Quick reminder: there are about 17,000 of these things in Baltimore.

    Below is a look at the 25 of them with the oldest vacant building notice dates. Some of these have notice dates of 20 years are more, with the oldest being from 1990 (although I had trouble locating this structure - see below). For more of this stuff, be sure to check out Baltimore Slumlord Watch.

  • Changes in Baltimore Neighborhood Policing

    February 4, 2016

    With the massive change in Baltimore Police Department arrests in 2015 I thought it might be worth investigating how each neighborhood was affected. Thankfully it appears the BPD arrest dataset on Open Baltimore and the Baltimore Neighborhood Indicators Alliance “Vital Signs” composite dataset use the same Census neighborhood definitions so we can look at the arrest rates in each neighborhood as it relates to the population and demograhpics. I’ve mapped two indicators by neighbohood that you can toggle between: the percent change in the number of arrests from 2014 to 2015 and the change in the arrest rate per 1,000 people in each neighbohood from 2014 to 2015. I’ve only included neighborhoods with more than 200 people in the population to try and remove some of the (numerically) noisier neighborhoods. Like I mentioned in the last post, these changes are not all directly tied to the change in marijuana policy; indeed it would be hard to imagine 10,000 fewer arrests in Baltimore, a drop of about 29%, due to marijuana policy alone.

    Select from the “Visible Layers” in the upper right corner which layer you want to view.
    Hover over each neighborhood to get the arrest counts from 2013, 2014, and 2015.
    Click on a neighborhood to get additional statistics about the arrests.

  • Maryland Marijuana Policy Only Accounts for 17% of Drop in Baltimore Arrests

    February 3, 2016

    When looking at Baltimore Police Department (BPD) arrest statistics it’s important to note that in Maryland possession of less than 10 grams of marijuana is no longer grounds for arrest, effective October 1, 2014. When looking at the drop in arrests from 2014 to 2015 we can at least expect to attribute a portion of the drop to the change in marijuana policy. But exactly how much?

  • Two Way Streets: Parking on St. Paul and Calvert

    December 22, 2015

    The Baltimore Department of Transportation is toying with the idea of converting St. Paul and Calvert streets to two-way traffic. These streets are components of a north-south corridor that bring northern Baltimore and Baltimore County residents directly into the heart of the city.

  • Fresh Style

    December 12, 2015

    I decided it was time to freshen up things around here. My new look is a mashup of the Tufte and Kasper themes for Jekyll. I’m a fan of Edward Tufte’s work and I think the layout gives more flexibility in presenting data and analysis. It took quite a bit of time and tweaking and I’m sure I’ve broken some things but I’m happy with how it turned out.

    Hope you all dig the new look! Let me know what you think or if you found something broken with a tweet or an email.

  • Baltimore Policing: Six Months After Freddie Gray Events

    November 12, 2015

    We’re about six months out now from the Freddie Gray incident and subsequent unrest in Baltimore. There have been a few dominant narratives about crime in Baltimore since, namely that we’ve had almost 300 homicides, the supposed police “slowdown” after the protests, and that enough prescription drugs had flooded the streets to “keep Baltimore high for a year.”

  • What I'm Up To

    July 14, 2015

    Just a quick note to highlight some recent publicity.

    In June, was kind enough to again shine a light on some of my analysis, this time on vacant buildings and lots in Baltimore. also used my iPython notebook for my Baltimore Vital Signs analysis as an example of how to use the tool.

    This past Friday I participated in Baltimore Data Day hosted by Baltimore Neighborhood Indicators Alliance at University of Baltimore. I was invited to contribute to a panel on civic technology and I sincerely hope someone, anyone, in the room gained something from it. has a quick blurb about it as well as my quick glance at Baltimore police arrest data post-Freddie Gray. Also be sure to check out Ryan Smith’s map work here and stay tuned for his new crime map.

    Coming up July 28, I’ll be at the DC Transportation Techies meetup which is making a cameo in Baltimore at the Hyatt Regency.

    I’ve added all this stuff to The Work page (upper left corner) as well as my slide deck from my “Apps Are Not The Answer” talk at Hack Baltimore.

  • Arrests in Baltimore Drop in May

    June 12, 2015

    There’s been quite a bit of discussion about whether or not protests in Baltimore have resulted in a policing “slowdown.”

  • B'More Schools Surrounded by Blight and Danger

    June 2, 2015

    Many small and midsize American cities have experienced decades of population shrinkage, but as you can imagine, it’s not so easy to shrink the physical footprint of a city. Compound this with the housing bubble burst a few years ago and significant wealth inequality and it’s no surprise that so many cities are wrestling with blight and abandonment. Even though the mayor wants to bring in 10,000 new families, there’s little chance the vacant property issue in Baltimore is going away anytime soon.

    In addition to the safety hazard neglected structures can pose, vacant lots and buildings are magnets for serious crime, including gang activity, drugs, murder, assault, and rape.

    In order to get a look at how blight is affecting Baltimore one pretty straightforward exercise is to map all of these abandoned buildings and parcels to see what the city looks like. I did this in the last post, and this work has been done before. It should come as no surprise to anyone living in Baltimore where the neighborhoods hardest hit with blight are.

    But context is everything, and the story these data tells us about what it’s like to be a school-aged kid in Baltimore is harrowing. The analysis suggests that about 68% of all vacant buildings in Baltimore are within a quarter mile of a city school, while about 27% of all vacant parcel area is within the same distance.

    Below you’ll see just how rampant this issue is for children in Baltimore. The map shows a quarter-mile radius around each city school and the vacant parcels (blue) and vacant buildings (orange) within those buffers (data from Open Baltimore).

  • Baltimore Vacant Housing on The Atlantic's Citylab

    June 1, 2015

    Citylab from The Atlantic just bounced some work done by an economist named Amine Ouazad that maps the density of vacant housing in Baltimore. The data Ouazad used are from the American Community Survey from the U.S. Census Bureau and this is the resulting map. Coincidently I had already been working with Open Baltimore’svacant building and lot data for a different analysis but I figured I could quickly respond to Ouazad’s map.

    Perhaps it’s an artifact of the data used and the census tract samples, but a similar plot using data from Open Baltimore shows more clearly the divides within the city. One commenter named “Jack” on the Citylab post indicated that the map showed high concentrations basically throughout the city, including wealthier and more business-centric neighborhoods like Mt. Vernon/Belvedere, Inner Harbor, Canton, and Federal Hill - comprising what we in Baltimore like to call “The White L.” On first glance at the Ouazad map my intuition agreed with Jack, and here’s what the Open Baltimore data actually look like. There are two layers - a heat map and the actual vacant building footprints (in black) so when you zoom in you can see the actual vacant buildings.

  • Verizon Wireless Cost Estimation

    May 21, 2015

    A few weeks ago, my 2-year contract came due for a phone upgrade. Seeing as how I was on an iPhone 4 and Apple’s recent OS upgrades have sucked the life clean out of it, I was looking forward to purchasing a new phone with the hopes that the new one will last beyond the 2-year cycle. I was going to purchase an iPhone 6, and I knew (or so I thought) what it would cost up front if I were to sign on a new 2-year contract. Simple, straight-forward. No sales pitches necessary.

    Then I entered the Verizon Wireless store.

    Within a few moments of arrival and signing in, my name was called and I was paired with a sales rep. Whilst hovering over me, he quickly moved the conversation from the phone itself to something called the Edge plan. I hadn’t bothered looking at the Edge plan before coming to the store - from the marketing material it seemed that it was targeted at customers interested in upgrading their phone more rapidly than every two years. I do not count myself in that category, so I didn’t bother looking into it.

    The salesman threw so many numbers at me that my head was spinning. He was claiming that I could get some accessory package essentially for free by switching to the Edge program. I couldn’t follow any of his logic, and I was only marginally clearer on the details when the manager attempted to explain the costs to me. I walked out of the store frustrated and not wanting to commit there and then.

    I figure I can’t have been the only Verizon customer to have experienced this recently. In order to figure out what the actual costs were and whether I’d actually save money I made a quick calculator for my family and I to use. I’m sharing it here in case it helps others.

    It’s pretty simple for now, and assumes you aren’t taking the bait and trying to upgrade your phone more frequently than two years. The white boxes are entry cells: select the phone you want, enter your typical monthly service costs, and enter the percent down payment on the phone you’d like to make if you’re considering the Edge program. The box to the right gives your estimated monthly bill (bold black text) and the total cost over two years for your options (in bold red text).

  • Upcoming Events You Should Check Out

    May 5, 2015

    I’ll be giving a couple of talks in Baltimore in the next two weeks. Would love for you to attend if you’re in the area! Click on the title of the talk below to RSVP through Meetup.

    “Apps Are Not The Answer” on Thursday May 14 @ DreamIt Health Baltimore. I’ll describe how I’ve tried to engage city officials and journalists in order to address policy, problem-solving, and decision-making. I’ll be sharing examples of my work, as well as some reminders of the principles that should guide problem-solving, applied science, and engineering. Hosted by Hack Baltimore.

    “Bad Ass(umptions)” on Wednesday May 20 @ Social Growth Technologies in Columbia, MD.” I’ll examine the false premises and assumptions that I’ve made and subsequently obliterated through my data analysis work on The Training Set. Hosted by the Baltimore Python crew.

  • Thanks Baltimore!

    April 23, 2015

    Stephen Babcock over at Baltimore was kind enough to give The Training Set a bit of publicity with a write-up about the work. Check it out!

  • Baltimore Police Department Overtime & Bonuses

    April 22, 2015

    As if the police haven’t been in the public eye enough lately, Baltimore Brew has been beating the drum regarding police overtime budget.

  • ShowYourWork Election Campaign

    April 14, 2015

    Here’s a small idea that I think would make the upcoming election cycle incrementally more bearable.

    Politicians seemingly pick numbers, figures, statistics, and trends out of thin air. This can be infuriating no matter where you fall on the political spectrum. Why don’t we push politicians to post annotated transcripts of their speeches and debates that provide citations? Surely they have an intern or two that they can promote from sticking signs on front lawns to transcribing and footnoting. This would be a big step in the direction of transparency and a substantive way to keep politicians honest.

    I’d propose we call it the #ShowYourWork campaign. If elementary school kids can show their work on math problems and high schoolers can cite their references in English papers, surely our politicians can follow their lead. Moreover, it would signal a commitment to and appreciation for the STEM fields that politicians like to pay lip service to when it comes to job growth.

  • A City Divided (in N dimensions)

    April 12, 2015

    It’s no secret that most American cities are segregated.

    When I tell people I live in Baltimore, that I love living there and that it’s a great American city, what I really mean is that the neighborhoods I typically experience are great. Baltimore of course has its many ills, and it’s very easy for white, college-educated individuals like myself to find themselves experiencing a very different reality than most in the city.

  • Know More, B'more.

    March 30, 2015

    I took a quick look at the Vital Signs dataset from the Baltimore Neighborhoods Indicators Alliance - Jacob France Institute (BNIA) at the University of Baltimore (data also available at Open Baltimore) and created some Hans Rosling-style plots.

  • City Paper Bounces The Training Set Parking Analysis

    March 19, 2015

    I was able to get in contact with Edward Ericson Jr. over at Baltimore’s City Paper concerning the parking investigation at 4200 Wickford Rd. Ed was kind enough to bounce my findings in a quick update blurb. Full text is below.

  • Which citations do people not care about paying?

    March 16, 2015

    The Baltimore citation open data includes a field for open balance, meaning the amount still owed to the city for each citation. With this information we can look at the rates of unpaid citations by citation type.

  • DOT Needs More Citation Categories

    March 12, 2015

    Here’s a suggestion directed at the Department of Transportation for improving their citation system.

  • When Not To Park On N. Charles St.

    March 7, 2015

    In a recent post I’ve shown particular blocks that became targets for parking enforcement. The short five block stretch on North Charles Street right around the corner from my apartment is one of those target locations.

  • Investigating City Paper's Claims About 4200 Wickford

    March 3, 2015

    Baltimore’s City Paper recently took a look at a unique parking situation in Baltimore at the 4200 block of Wickford Road. Unfortunately, they didn’t take a peek at the same open data for parking citations I’ve been investigating lately.

    This road is a small two-way street in a quiet, wealthy neighborhood adjacent to the Johns Hopkins University campus called Keswick. The street is narrow, and so vehicles typically park on either side with their tires past the curb which is illegal. Apparently they also occasionally park on the left facing the wrong direction.

    Start here in Google Street View and work your way north - the street is indeed quite narrow and most cars (save for maybe the Smart car) are parked with their two right tires over the curb. The Smart car and several others are parked on the left side facing the wrong direction.

  • Biasing Citations Towards Those Who Pay

    February 27, 2015

    Unless this dataset is somehow still screwed up, there was a stark shift in behavior that coincided with the explosion of parking citations issued in Baltimore.

  • Rapid Increase in Citations

    February 24, 2015

    The total number of parking citations in Baltimore exploded between September 2013 and March 2014.

    Total monthly citations in Baltimore, Maryland.
  • Introducing "My Electricity Use" Visualization Page

    February 16, 2015

    This is just a quick blurb to say I’m building a new page for my website for visualizing my electricity usage data. You can find it in the upper left menu panel by clicking on “My Electricity Use.” Right now I’ve just got an interactive heat map up for the hourly data (based on this post). I’m admittedly slow to the Plotly game, but I’ll be using it a lot more regularly going forward.

  • My Block is a Target.

    February 10, 2015

    Now that we can be sure of having a complete data set for the immediately preceding two years, we can take a closer look at parking citation policies and compliance.

    Total monthly citations for top 10 types of citation fines issued.
  • Quick Response from Baltimore Chief Data Officer

    January 23, 2015

    Within hours of publishing my last post on Baltimore’s parking (etc.) citation dataset, the issues were corrected thanks to Sharon Paley at Hack Baltimore and Heather Hudson, Baltimore’s Chief Data Officer in the Mayor’s Office of Information Technology.

  • The Lost Revenue of the Baltimore Traffic Cameras

    January 21, 2015

    The title of the dataset I’m working with on Baltimore’s open data portal is “Parking Citations.” I downloaded a snapshot of the dataset which ended up being 2,689,647 total citations through January 12, 2015.

  • Importing Lots of CSV's

    January 8, 2015

    I find myself importing a slew of comma-separated value (csv) files quite often so I wrote up a quick bit of code to handle it.

    It’s uber-simple - just loops through all the csv’s in a directory, appends them to one Pandas dataframe, and saves the new csv. It also tacks on the filenames of each file if separating into the original files is needed later. This also assumes all the fields are named exactly the same in all the csv files. Sure beats reading each individually and merging after.

    import pandas as pd
    import glob
    path = 'data_csv'
    allFiles = glob.glob(path + '/*.csv')
    df = pd.DataFrame()
    for i, filename in enumerate(allFiles):
        df_file = pd.read_csv(filename, skiprows=0)
        df_file['filename'] = filename
        df_file['file_ID'] = i
        df = df.append(df_file)
  • Excessive Data Visualization

    December 25, 2014

    Christmas Day. I’m bored.

  • Assessing Demand-Side Storage System Performance

    November 14, 2014

    By varying parameters of the demand-side storage model we can determine the economic plausibility and benefits of home grid-tied batteries.

  • Demand Heat Map

    November 11, 2014

    Opower posted a heat map of hourly electricity demand that I wanted to replicate for our data.

    This makes changes in hourly trends over the course of the year readily apparent, and even highlights differences day to day. The 6am/7am peak is pretty clear, as are the evening hours in the summer. Between midnight a 4am, the usage is clearly higher (deeper yellow) in the summer as well due to AC load.

  • Storage Model Details

    October 9, 2014

    This post fills in some more details of the demand-side storage model. In the last post I presented a flow chart of how the logic works but didn’t give a diagram of what the physical system might look like. That’s below.

    Demand-side storage physical system.
  • Intro to Storage Model

    September 23, 2014

    Buying lead-acid batteries and switching to one of BGE’s time-of-use pricing plans may be a very good idea indeed.

  • Summer Updates

    September 22, 2014

    It’s been a bit. Work got busy, I got stalled and needed an IDE, and I developed a new model. I didn’t have much time over the summer for this stuff. My real job got pretty busy. You don’t care about that though.

  • Should I Switch to Time-of-Use Pricing?

    August 12, 2014

    Now that I’ve got BGE’s available residential time-of-use plans, I can evaluate whether I would save money by switching to one.

  • BGE Residential Pricing Plans

    August 3, 2014

    A quick blurb on a couple of pieces of the demand-side storage puzzle I’m working on.

  • Mo Apartment, Mo Problems

    August 1, 2014

    Here’s a quick comparison of average weekdays and weekends in the new and old apartments.

  • The New Pad

    July 31, 2014

    Steph and I moved on May 23. I didn’t manage to grab the data for the last three weeks of May before changing apartments (and therefore BGE accounts), so I missed out on those three weeks of data. However, we’ve now been there just over two months and so we’ve got some A/C load data to work with.

    The differences in the data are pretty staggering. One issue is that we moved at the beginning of summer so our previous apartment with electric resistance heating has all the heating loads and the new apartment with central A/C has all the cooling loads. The rest of the differences are described below.

  • Feeding the Data Monster

    July 31, 2014

    Scouring the interwebz for some data.

  • Closer Look at the Electricity SVR Forecast

    June 18, 2014

    First a disclaimer on cross-validation:

    In the previous post I trained a support vector machine model using electricity demand data. I should state that this is not the end of the analysis nor a robust model ready for release into the wild. The model would need to undergo cross-validation to forge it into something more robust and generalized.

  • Forecasting with Machine Learning

    May 8, 2014

    Taking a big jump into machine learning now.

    As part of my research in grad school I had started getting interested in the use of machine learning tools for predicting energy consumption. I went so far as to enroll in a machine learning class, but lacking the some of the statistical/mathematical knowledge for what they were teaching I ended up dropping about three-quarters of the way through. I’ve since gone back on my own time and learned more statisical inference through online coursework, and am taking a stab at some analysis and coding.

  • Electricity Usage Autocorrelation

    May 5, 2014

    Here’s a quick look using historical data to predict the future. If the goal is to predict the next hour’s electricity consumption, a good predictor might be the consumption of the hour (or several hours) immediately prior. One way to assess whether or not this is through the autocorrelation given by:

  • Energy Use and Weather

    April 18, 2014

    Stepping back, let’s take a look at what the actual time series of the smart meter electricity data looks like for my apartment.

    I’ll bring in the hourly outdoor temperature as measured at Baltimore-Washington International Airport (less than 10 miles away as the crow flies). The data are obtained via the API at Weather Underground. Again, you can follow along my line (curve, circle, some other obscure geometry) of thinking by taking a look at the iPython notebook.

    Time series of hourly electricity consumption and outdoor temperature.
  • More Smart Meter Data...

    April 17, 2014

    In an earlier post, I showed initial descriptive plots for my electricity use. Now that we have some more data, we can break these up to see differences as the spring season progresses.

  • Guess What Time Justin Wakes Up

    April 16, 2014

    First, a little bit on my tools.

  • My Electricity Use

    March 16, 2014

    Previously, I laid out what BGE’s smart meters are doing in Baltimore. Below is a quick snapshot of the downloaded data for my apartment electricity usage.

  • BGE Smart Meters

    March 15, 2014

    Citizens of Baltimore and captive customers of Baltimore Gas & Electric!

  • Goalkeeper Performance Stats

    March 5, 2014

    Starting off by looking at some footy data.

  • The Training Set - Justin Elszasz