- Source lines of code
Source lines of code (SLOC) is a
software metric used to measure the size of a software program by counting the number of lines in the text of the program'ssource code . SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimateprogramming productivity or effort once the software is produced.Measuring SLOC
Many useful comparisons involve only the
order of magnitude of lines of code in a project. Software projects can vary between 10 to 100,000,000 or more lines of code. Using lines of code to compare a 10,000 line project to a 100,000 line project is far more useful than when comparing a 20,000 line project with a 21,000 line project. While it is debatable exactly how to measure lines of code, discrepancies of an order of magnitude can be clear indicators of software complexity orman hours .There are two major types of SLOC measures: physical SLOC and logical SLOC. Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code including comment lines. Blank lines are also included unless the lines of code in a section consists of more than 25% blank lines. In this case blank lines in excess of 25% are not counted toward lines of code.
Logical SLOC measures attempt to measure the number of "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like
programming language s is the number of statement-terminating semicolons). It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions. Unfortunately, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC.Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:
In this example we have:
* 1 Physical Lines of Code LOC
* 2 Logical Lines of Code lLOC (for statement andprintf statement)
* 1 Comment LineDepending on the programmer and/or coding standards, the above "line of code" could be, and usually is, written on many separate lines:
In this example we have:
* 4 Physical Lines of Code LOC (Is placing braces work to be estimated?)
* 2 Logical Line of Code lLOC (What about all the work writing non-statement lines?)
* 1 Comment Line (Tools must account for all code and comments regardless of comment placement.)Even the "logical" and "physical" SLOC values can have a large number of varying definitions. Robert E. Park (while at the Software Engineering Institute) et al. developed a framework for defining SLOC values, to enable people to carefully explain and define the SLOC measure used in a project. For example, most software systems reuse code, and determining which (if any) reused code to include is important when reporting a measure.
Origins of SLOC
At the time that people began using SLOC as a metric, the most commonly used languages, such as
FORTRAN and assembler, were line-oriented languages. These languages were developed at the time whenpunch cards were the main form of data entry for programming. One punch card usually represented one line of code. It was one discrete object that was easily counted. It was the visible output of the programmer so it made sense to managers to count lines of code as a measurement of a programmer's productivity. Today, the most commonly used computer languages allow a lot more leeway for formatting. One line of text no longer necessarily corresponds to one line of code.Usage of SLOC measures
SLOC measures are somewhat controversial, particularly in the way that they are sometimes misused. Experiments have repeatedly confirmed that effort is highly correlated with SLOC, that is, programs with larger SLOC values take more time to develop. Thus, SLOC can be very effective in estimating effort. However, functionality is less well correlated with SLOC: skilled developers may be able to develop the same functionality with far less code, so one program with less SLOC may exhibit more functionality than another similar program. In particular, SLOC is a poor productivity measure of individuals, since a developer can develop only a few lines and yet be far more productive in terms of functionality than a developer who ends up creating more lines (and generally spending more effort). Good developers may merge multiple code modules into a single module, improving the system yet appearing to have negative productivity because they remove code. Also, especially skilled developers tend to be assigned the most difficult tasks, and thus may sometimes appear less "productive" than other developers on a task by this measure. Furthermore, inexperienced developers often resort to code duplication, which is highly discouraged as it is more bug-prone and costly to maintain, but it results in higher SLOC.
SLOC is particularly ineffective at comparing programs written in different languages unless adjustment factors are applied to normalize languages. Various
computer language s balance brevity and clarity in different ways; as an extreme example, mostassembly language s would require hundreds of lines of code to perform the same task as a few characters in APL. The following example shows a comparison of a "hello world" program written in C, and the same program written inCOBOL - a language known for being particularly verbose.In comparison, below are figures for various graphics applications.
LOC and relation to security faults
A number of experts have claimed a relationship between the number of lines of code in a program and the number of bugs that it contains. This relationship is not simple, since the number of errors per line of code varies greatly according to the language used, the type of quality assurance processes, and level of testing, but it does appear to exist. More importantly, the number of bugs in a program has been directly related to the number of security faults that are likely to be found in the program.
This has had a number of important implications for system security and these can be seen reflected in operating system design. Firstly, more complex systems are likely to be more insecure simply due to the greater number of lines of code needed to develop them. For this reason, security focused systems such as
OpenBSD grow much more slowly than other systems such as Windows and Linux. A second idea, taken up in both OpenBSD and many Linux variants, is that separating code into different sections which run with different security environments (with or without special privileges, for example) ensures that the most security critical segments are small and carefully audited.Utility
Advantages
#Scope for Automation of Counting: Since Line of Code is a physical entity; manual counting effort can be easily eliminated by automating the counting process. Small utilities may be developed for counting the LOC in a program. However, a code counting utility developed for a specific language cannot be used for other languages due to the syntactical and structural differences among languages.
#An Intuitive Metric: Line of Code serves as an intuitive metric for measuring the size of software due to the fact that it can be seen and the effect of it can be visualized. Function Point is more of an objective metric which cannot be imagined as being a physical entity, it exists only in the logical space. This way, LOC comes in handy to express the size of software among programmers with low levels of experience.Disadvantages
#Lack of Accountability: Lines of code measure suffers from some fundamental problems. Some think it isn't useful to measure the productivity of a project using only results from the coding phase, which usually accounts for only 30% to 35% of the overall effort.
#Lack of Cohesion with Functionality: Though experiments have repeatedly confirmed that effort is highly correlated with LOC, functionality is less well correlated with LOC. That is, skilled developers may be able to develop the same functionality with far less code, so one program with less LOC may exhibit more functionality than another similar program. In particular, LOC is a poor productivity measure of individuals, since a developer can develop only a few lines and still be more productive than a developer creating more lines of code.
#Adverse Impact on Estimation: Because of the fact presented under point (a), estimates based on lines of code can adversely go wrong, in all possibility.
#Developer’s Experience: Implementation of a specific logic differs based on the level of experience of the developer. Hence, number of lines of code differs from person to person. An experienced developer may implement certain functionality in fewer lines of code than another developer of relatively less experience does, though they use the same language.
#Difference in Languages: Consider two applications that provide the same functionality (screens, reports, databases). One of the applications is written in C++ and the other application written in a language like COBOL. The number of function points would be exactly the same, but aspects of the application would be different. The lines of code needed to develop the application would certainly not be the same. As a consequence, the amount of effort required to develop the application would be different (hours per function point). Unlike Lines of Code, the number of Function Points will remain constant.
#Advent of GUI Tools: With the advent of GUI-based programming languages and tools such asVisual Basic , programmers can write relatively little code and achieve high levels of functionality. For example, instead of writing a program to create a window and draw a button, a user with a GUI tool can use drag-and-drop and other mouse operations to place components on a workspace. Code that is automatically generated by a GUI tool is not usually taken into consideration when using LOC methods of measurement. This results in variation between languages; the same task that can be done in a single line of code (or no code at all) in one language may require several lines of code in another.
#Problems with Multiple Languages: In today’s software scenario, software is often developed in more than one language. Very often, a number of languages are employed depending on the complexity and requirements. Tracking and reporting of productivity and defect rates poses a serious problem in this case since defects cannot be attributed to a particular language subsequent to integration of the system. Function Point stands out to be the best measure of size in this case.
#Lack of Counting Standards: There is no standard definition of what a line of code is. Do comments count? Are data declarations included? What happens if a statement extends over several lines? – These are the questions that often arise. Though organizations like SEI and IEEE have published some guidelines in an attempt to standardize counting, it is difficult to put these into practice especially in the face of newer and newer languages being introduced every year.
#Psychology: A programmer whose productivity is being measured in lines of code, will be rewarded for generating more lines of code even though he could write the same functionality with fewer lines. The more management is focusing on lines of code, the more incentive the programmer has to expand his code with unneeded complexity. Since lines of code is proportional to the following cost of fixing bugs and maintaining the program in general, this is bad. Its an example of the business proverb: "What you measure is what you get."Fact|date=November 2007In the
PBS documentaryTriumph of the Nerds , Microsoft executiveSteve Ballmer criticized the use of counting lines of code:In IBM there's a religion in software that says you have to count K-LOCs, and a K-LOC is a thousand line of code. How big a project is it? Oh, it's sort of a 10K-LOC project. This is a 20K-LOCer. And this is 50K-LOCs. And IBM wanted to sort of make it the religion about how we got paid. How much money we made off
OS/2 , how much they did. How many K-LOCs did you do? And we kept trying to convince them - hey, if we have - a developer's got a good idea and he can get something done in 4K-LOCs instead of 20K-LOCs, should we make less money? Because he's made something smaller and faster, less K-LOC. K-LOCs, K-LOCs, that's the methodology. Ugh! Anyway, that always makes my back just crinkle up at the thought of the whole thing.Related terms
KLOC: 1,000 lines of code
KDLOC: 1,000 delivered lines of code
KSLOC: 1,000 source lines of code
MLOC: 1,000,000 lines of code
GLOC: 1,000,000,000 lines of code
TLOC: 1,000,000,000,000 lines of code
References
Additional reading
* citation
url=http://reports-archive.adm.cs.cmu.edu/anon/isri2005/CMU-ISRI-05-125.ps
title=Forecasting Field Defect Rates Using a Combined Time-based and Metric–based Approach a Case Study of OpenBSD (CMU-ISRI-05-125)
first1=Luo
last1= Li
first2=Jim
last2= Herbsleb
first3=Mary
last3=Shaw
publisher=Carnegie-Mellon University
date=May 2005
* cite journal
url=ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2003/2003-13.ps.gz
last = McGraw
first = Gary
title = From the Ground Up: The DIMACS Software Security Workshop
journal = IEEE Security & Privacy
date= March/April 2003
volume = 1
issue = 2
pages = pp. 59-66
* cite journal
author = Park, Robert E., "et. al."
title = Software Size Measurement: A Framework for Counting Source Statements
journal = Technical Report CMU/SEI-92-TR-20
url = http://www.sei.cmu.edu/publications/documents/92.reports/92.tr.020.htmlExternal links
* [http://msquaredtechnologies.com/m2rsm/docs/rsm_metrics_narration.htm Definitions of Practical Source Lines of Code] Resource Standard Metrics (RSM) defines "effective lines of code" as a realistics code metric independent of programming style.
* [http://msquaredtechnologies.com/m2rsm/rsm_software_project_metrics.htm Effective Lines of Code eLOC Metrics for popular Open Source Software] Linux Kernel 2.6.17, Firefox, Apache HPPD, MySQL, PHP using RSM.
* cite web
author = Wheeler, David A.
title = SLOCCount
url = http://www.dwheeler.com/sloccount
accessdate = 2003-08-12
* cite web
author = Wheeler, David A.
title = More than a Gigabuck: Estimating GNU/Linux's Size
month = June | year = 2001
url = http://www.dwheeler.com/sloc
accessdate = 2003-08-12
* Tanenbaum, Andrew S. "Modern Operating Systems" (2nd ed.). Prentice Hall. ISBN 0-13-092641-8.
* cite web
url = http://www.computerworld.com.au/index.php/id;1942598204;pp;1
title = Tanenbaum outlines his vision for a grandma-proof OS
author = Howard Dahdah
year =2007-01-24
accessdate = 2007-01-29
* [http://www.chris-lott.org/resources/cmetrics/ C. M. Lott: Metrics collection tools for C and C++ Source Code]
Wikimedia Foundation. 2010.