I have a database of scraped play by play data from the MLB gameday XML feed.
I want to be able to pull a list of batting averages and other statistics but I am stuck.
I have an events table (each at-bat) which is setup as following
`event_id` int(15) NOT NULL AUTO_INCREMENT,
`game_id` int(8) NOT NULL,
`batter` int(8) NOT NULL,
`pitcher` int(8) NOT NULL,
`inning` int(3) NOT NULL,
`event` varchar(25) NOT NULL,
`outs` int(1) NOT NULL,
`rbi` int(1) NOT NULL,
`home_runs` int(3) NOT NULL,
`away_runs` int(3) NOT NULL,
`batting_team` int(3) NOT NULL,
`b1` int(6) NOT NULL,
`b2` int(6) NOT NULL,
`b3` int(6) NOT NULL,
`pitch_detail` varchar(200) NOT NULL,
PRIMARY KEY (`event_id`)
And a players table that simply holds a record of player_id, first_name, last_name, and team_id
I am pretty limited in my SQL and have got only as far as
SELECT players.first_name, players.last_name, count(events.event), events.event
FROM mlb.events
JOIN mlb.players
ON mlb.players.player_id = mlb.events.batter
WHERE team_id = %s
GROUP BY events.batter, events.event
ORDER BY mlb.events.batter, events.event
This returns a list as follows
('Henry', 'Blanco', 3L, 'Flyout')
('Henry', 'Blanco', 4L, 'Groundout')
('Henry', 'Blanco', 4L, 'Single')
('Henry', 'Blanco', 5L, 'Strikeout')
For each player.
I could extract this into python and make it usable but I figure there must be SQL that I can write that can do this for me. I need to calculate hits/(outs+hits) to give me his average. What I want out is as follows.
('Henry', 'Blanco', 0.285)
For ease of example lets just say the only outs are Flyout, Groundout, Strikeout and the hits are Single, Double, Triple and Home Run.
Any help or direction greatly appreciated.