RSS Feed
Website is Under Construction if something does not work please try refrshing.
Viewing Blog Id 405

May 19th, 2009: The Zack Hample Regression Model

Who is Zack Hample? He is a really cool guy known for his ability to catch balls at major league baseball stadiums. He has caught 3,998 balls as of this post, and is expected to catch number 4,000 at his next baseball game in Los Angeles at the Dodgers stadium.  His ball hawking skills are so good that he even wrote a book on the subject, which I read when I was like 15, when I used to read dozens of books over the summer. I found his tactics amazing at the time, and I still do now. I recall him noting little things such as to pre-bend the perforation on the ticket stub, so it is easier for the gatekeeper to rip it off so you can get to the seats first. This does not matter now with all MLB stadiums using electronic readers, but it was thinking about these kind of issues that amazed me at the time. His most well known tactic is what he calls the glove trick, which is a contraption he invented using a baseball glove, string and a sharpie in order to retrieve balls on the field that you can’t get to because they are out of reach.
Last year, I found that Hample has a blog (which I now read religiously) where he recounts all of his games now and a personal website with info on him and his baseball collection. This site includes list of every game he has went to, how many balls he caught at that game (including game balls), if there was Batting Practice at the game, what stadium the game was played at and what was the announced attendance at that game.
As a statistician, I decided to make a model that predicts how many balls Hample will get at a given game. I took all of his stats and moved them into Microsoft Excel, and cleaned it up a bit so I could import that into Minitab, a statistical software package. I was able to import the necessary info on 649 of the 722 games that Hample has been counting balls for. He has no data for the first games he went to from 1990-92, and there is no official attendance number for rainout games he attended, so those got thrown out of the model.
Here is the Regression Model:
Balls = 3.59 - 0.000098 Attendance + 2.00 BPCode + 0.369 Experience - 0.825 NY Stadium
Balls is the number of balls Hample catches at a given game. Here are some basic statistics on how many balls he gets in a given game
Descriptive Statistics: Balls
Variable   Mean SE Mean StDev Minimum     Q1 Median    Q3 Maximum
Balls     5.835    0.142 3.654    0.000 3.000   5.000 8.000   28.000
So over his career, he averages about just under 6 balls a game. Let me go through the variables in the model and explain them:
Attendance- This is the paid attendance at the game. The average paid attendance at a Hample game is just a hair under 30,000 people. For about every 10k people at a game, hample is going to lose a ball, clearly as the number approaches zero, the amount of balls Hample gets goes up a lot more. For example, I am sure that the difference between a 10k filled stadium and a 20k filled stadium is a lot more than the difference between a 30k filled one and a 40k filled one. This is a simple linear regression model; I did not want to try fitting a different model because I have a real thesis to work on.
BPCode- Hample gets the majority off his balls from batting practice, so it helps when the guys are hitting them into the seats. If a team has batting practice, the model predicts hample grabbing 2 additional balls. I thought it would be more, but that is not the case. I assume that Hample is able to get that difference as low as he does because when BP is not happening, he is able to focus more on getting balls in ways other than off the bat.
Experience- If you look at the sum of Hample’s work, it is clear that he is much better at hawking balls now that he was in 1990. Every year that Hample gets older, he averages .369 balls more. This is his 19th year of hawking balls (remember real mathematicians start counting at 0, so Hample is 19 this year), and that nets him just over 7 more balls a game than what he pulled in 1990 under the same conditions (his 0 year).
NYStadium- Hample lives in New York, so naturally he goes to more NY Parks. I created this variable but establishing “yankee” “shea” and “Citi” as one set, and all other parks as another. I was not sure there would be, but there is a statistically significant difference in going to a New York park not accounted for by the attendance number alone. Hample can expect almost 1 less ball a game at a 40k crowd at Citi field than he can under the same conditions at a park outside of the Empire State.
So, what does this prove? Well we already knew that Hample is amazing at catching balls, but we now see how he is affected by Attendance, Batting Practice, Age and where the stadium is. I just know I am happy I got my 1st game ball on the fly at Nationals stadium last night off the Pirates catcher Cruz in the top of the 9th inning. It was a great game, where I got 2 balls during BP, 1 thrown by Ian Snell and another off the bend in the wall down the 3rd base line.  But hey, at a game with only 14,549 paid attendance with BP outside of NY, Zach Hample would have got on average 11.18 balls at Nats Stadium. But Zach knows that he got his record, 28 balls, in his one appearance at Nationals Park, last year. I guess the model don’t factor in things like “perfect for the glove trick”

For you real stat nerds, here is some additional regression model info:

Regression Analysis: Balls versus Attendance, BPCode, ...
The regression equation is
Balls = 3.59 - 0.000098 Attendance + 2.00 BPCode + 0.369 Experience
        - 0.825 NY Stadium
649 cases used, 11 cases contain missing values
Predictor          Coef     SE Coef      T      P
Constant         3.5948      0.4826   7.45 0.000
Attendance -0.00009789 0.00001131 -8.65 0.000
BPCode           2.0027      0.4085   4.90 0.000
Experience      0.36942     0.02550 14.48 0.000
NY Stadium      -0.8248      0.2511 -3.29 0.001
S = 3.07974   R-Sq = 29.9%   R-Sq(adj) = 29.4%
Analysis of Variance
Source           DF       SS      MS      F      P
Regression        4 2602.67 650.67 68.60 0.000
Residual Error 644 6108.22    9.48
Total           648 8710.89
Comments   |  Total:  4
April 2014
Archive Menu

Choose by Date:
Administered By:
Nic Shayko
Site Created By:Greg Shayko
Banner Created By:Annie Kwon