- Sports data
Sports data are typically published online and in newspapers as
box score s.Box scores contain a numerical view of a sporting event and are of interest forsports betting andfantasy sports .While box scores contain a wealth of information (e.g. Box score) they are impractical for performing research.The open source based (
python (programming language) ) sports data query language (SDQL) in use at free web sites [http://KillerSports.com] [http://SportsDataBase.com] allows users to ask questions such as "How do the Cubs do after their starter got knocked out in the third?" and "How do team do after playing extra innings the night before?"The sports data query language (SDQL) has the same basic format for all sports:
GAME REFERENCE:PARAMETERGame references include
p for the team previous game
P for the previous match up
o for the team's opponent
n for the next game
N for the next match upBaseball
Because of the importance of pitching, baseball has the additional game references
s for starter's last start
S for starter's last start against the current opponent.For baseball common parameters include:
at bats, attendance, biggest lead, conference, date, day, division, double header, earned runs, errors, fly balls, ground balls, hits, hitters faced, home runs, inning runs, left on base, line, losses, margin, matchup losses, matchup wins, month, over, pitchers used, pitches, playoffs, rbi, rest, run line, run line runs, runs, season, series game, series games, site, start time, starter, starter earned runs, starter era, starter fly balls, starter ground balls, starter hits, starter hitters faced, starter home runs, starter innings pitched, starter losses, starter matchup losses, starter matchup wins, starter pitches, starter rest, starter runs, starter strike outs, starter strikes, starter throws, starter walks, starter wins, starters previous game, starters previous match up, streak, strike outs, strikes, team, team left on base, temperature, time zone, total, umpires, under, walks, winsMLB parameter description and sample queries
"'attendance" - This is the reported attendance. It can be useful to see how teams perform as a function of the attendance.
Sample SDQL query: attendance<10000biggest lead - This is the biggest lead for a team in the game. It can be useful to see how teams perform after a loss in which they held, say, a three-run lead, or off a win in which they never trailed.
Sample SDQL query: biggest lead = 0conference - This is league: either NL or AL. It can be useful, for example, to see how teams perform in interleague games.
Sample SDQL query: conference = NLdate - This is the date in eight-digit format. This is useful for setting a search-from date, for a recently emerging trend or system. June 10th 2006 is represented as 20060610.
Sample SDQL query: date>20050804day - This is the day of the week. The day must be spelled out completely with the first letter capitalized. This is useful to uncover how teams perform on a particular day of the week.Sample SDQL query: day=Sunday
hits - This is the number of hits by the team. It can be useful to see how teams perform, for example, when they have had double digit hits two games in a row.Sample SDQL query: hits>=12
inning runs - This powerful parameter can be used to investigate how a team performs based on the number of runs they scored in a particular inning or range of innings. For example, how a team performs after a loss in which they scored at least 3 runs in the first inning or after a win in which they were shut out in the last six innings.Sample SDQL query: inning runs [:3] >=5
left on base - This is the number of runners left on base by the hitters. This is not to be confused with team-left-on-base. The difference can be demonstrated with two examples. In one inning the first three batters walk and the next three strike out. The 'left on base' for this inning is NINE - three for each batter. If the first two batters strike out, the next three walk and the last batter strikes out, the 'left on base' is only three. In both cases, the 'team left-on base' is three. This parameter is useful, for example, to see how a team performs after a one-run loss in which their hitters stranded at least five more runners than the opponent.Sample SDQL query: left on base < 5
line - This is the Vegas line. The lines are stored in five cent increments and ten cent lines are used. For example, a pick game is when both teams are minus 105. To query on favorites, set the line to less than minus 105 and to query on underdogs, set the line to greater than zero.Sample SDQL query: line < -199
margin - This is the margin by which a team won or lost. It is in units of runs. This is a very commonly used parameters and can be used, for example, to uncover how a team performs in the third game of a series when they lost the first two by a single run.Sample SDQL query: margin = 1
month - This is the month and they are numbered rather than named. This is to facilitate queries involving, say, after April, which would be greater than 4 because April is the fourth month.Sample SDQL query: month =4
pitchers used - This is the number of pitchers used by the team. It can be useful to see how teams perform, when they are off a win in which they used at least five pitchers or when they used at least five pitchers for two straight games. Sample SDQL query: pitchers used > 5
playoffs - This will allow the search of only regular season results and only playoff results. To search only playoff results, set the playoffs=1 and to search only regular season results, set playoffs=0. The KillerSports.com default is to search on both playoff and regular season results.Sample SDQL query: playoffs=0
rest - This is the number of days rest a team has between games. Usually it is one or zero, but rainouts can expand this number to two, three or even four.Sample SDQL query: rest > 0
runs - This is the totals number of runs a team scored in a game. It can be used, for example, to see how a team performs after being shut out, after scoring at least six runs and losing, or after scoring three runs or fewer in three straight games.Sample SDQL query: p:runs=0
season - This is simply the season. It actually is very useful to see how trends and systems have been evolving over the seasons. For example, let's say you uncovered a system in which underdogs have a record of 202-199 over the past four seasons, making a $100 player $4650. This is a very strong system, but how has it done recently? How has it done in the current season? By simply adding 'and season' to the end of the query, the results will be given for each season individually. Sample SDQL query: season > 2005
series game - This tremendously useful parameter is the number of the game in the series. This can be used, for example, to determine how a team performs in the second game of a home series when they lost the first as a favorite or in the third game of a three-game series when they lost the first two. Sample SDQL query: series game = 3
series games - This is the total number of the game in the series. For example, if the team is playing the first game of a three game series, the series games is 3 while the series game is 1. This can be used, for example, to determine how a team performs in the second game of a two-game home series when they won the first by at least five runs as an underdog.Sample SDQL query: series game = 4
site - This commonly used parameter is simply the site of the game. In rare instances when the site is neutral, the home team is the team that bats second.Sample SDQL query: site = home
start time - This is the start time of the game. All times are local and military time is used, with all four number in a single string. For example, 7:05 pm is 1905. Sample SDQL query: start time<1400
starter - This is the starting pitcher. The pitcher's first and last name must be given in single quotes. This allows a complete investigation of how a starting pitchers performs in various situations. For example, as a 150+ road dog, when their team lost his last two starts, when the team is on a three game losing streak or when he got fewer than three runs of support for two straight starts.Sample SDQL query: starter= 'Roger Clemens'
starter innings pitched - This is the number of innings pitched by the starter. This can be used to determine how a pitcher performs when he went eight-plus innings in his last start, when he is off a start in which he went less than four innings or when he went 7+ innings and his team lost his last start.Sample SDQL query: starter innings pitched < 5
starter throws - This is the handedness of the starting pitcher. It can be used, for example, to see how a team performs against a lefty when they faced righties for three straight games.Sample SDQL query: starter throws = left
strike outs - This is the number of times a team struck out in a game.Sample SDQL query: strike outs >= 10
team - This name of the team. The database at KillerSports.com uses the nickname of the team. This parameters can be used to see how a team performs vs another or how a team performs after a series vs a particular opponent.Sample SDQL query: team = Blue Jays
team left on base - This is the total number of runners stranded at the end of each inning. This parameters can be used to uncover how a team performs as a function of the number of runners than go from the basepath to the field.Sample SDQL query: team left on base >12
temperature - This is the reported temperature at the start of the game. It can be used, for example, to see how teams perform in cold temperatures or how starters perform in hot temperatures. Sample SDQL query: temperature < 60
total - This is the consensus Vegas OU line for the game. It can be used, for example, to see how a team performs when the OU line is high or how a starting pitcher performs on the road when the OU line is low.Sample SDQL query: total < 7.5
walks - This is number of walks the team drew - not the number of walks their pitchers allowed. It can be used, for example, to see how teams perform in games in which they did not draw a walk, or after a game in which they drew at least five walks. Sample SDQL query: walks = 0
Wikimedia Foundation. 2010.