CLOC
Count Lines of Code

Overview ^

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable. cloc is known to run on many flavors of Linux, Mac OS X, AIX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl 5.6.1 or higher, or Cygwin installed. Alternatively one can use the Windows binary of cloc generated with perl2exe to run on Windows computers that have neither Perl nor Cygwin.)

cloc contains code from David Wheeler's SLOCCount, Damian Conway and Abigail's Perl module Regexp::Common, and Sean M. Burke's Perl module Win32::Autoglob,

License^

cloc is licensed under the GNU General Public License, v2 , excluding portions which are copied from other sources. Code copied from the Regexp::Common and Win32::Autoglob Perl modules is subject to the Artistic License.

Why Use cloc? ^

cloc has many features that make it easy to use, thorough, extensible, and portable:

  1. Exists as a single, self-contained file that requires minimal installation effort---just download the file and run it.
  2. Can read language comment definitions from a file and thus potentially work with computer languages that do not yet exist.
  3. Allows results from multiple runs to be summed together by language and by project.
  4. Can produce results in a variety of formats: plain text, XML, YAML, comma separated values.
  5. Can count code within compressed archives (tar balls, Zip files, Java .ear files).
  6. Has numerous troubleshooting options.
  7. Handles file and directory names with spaces and other unusual characters.
  8. Has no dependencies outside the standard Perl distribution.
  9. Runs on Linux, FreeBSD, NetBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, and z/OS systems that have Perl 5.6 or higher. The source version runs on Windows with either ActiveState Perl or cygwin. Alternatively on Windows one can run the Windows binary which has no dependencies.

Other Counters ^

If cloc does not suit your needs here are other freely available counters to consider:

Other references:

Regexp::Common, Digest::MD5, Win32::Autoglob

Although cloc does not need Perl modules outside those found in the standard distribution, cloc does rely on a few external modules. Code from two of these external modules--Regexp::Common and Win32::Autoglob--is embedded within cloc. A third module, Digest::MD5, is used only if it is available. If cloc finds Regexp::Common installed locally it will use that installation. If it doesn't, cloc will install the parts of Regexp::Common it needs to a temporary directory that is created at the start of a cloc run then removed when the run is complete. The necessary code from Regexp::Common v2.120 is embedded within the cloc source code (see subroutine Install_Regexp_Common() ). Only three lines are needed from Win32::Autoglob and these are included directly in cloc.

Additionally, cloc will use Digest::MD5 to validate uniqueness among input files if Digest::MD5 is installed locally. If Digest::MD5 is not found the file uniqueness check is skipped.

The Windows binary is built on a computer that has both Regexp::Common and Digest::MD5 installed locally.

Basic Use ^

cloc is a command line program takes file, directory, and/or archive names as inputs. Here's an example of running cloc against the Perl v5.10.0 source distribution:

  
prompt> cloc perl-5.10.0.tar.gz
    4076 text files.
    3883 unique files.                                          
    1521 files ignored.

http://cloc.sourceforge.net v 1.07  T=10.0 s (251.0 files/s, 84566.5 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
Perl               2052    110356    112521    309778 x   4.00 =     1239112.00
C                   135     18718     22862    140483 x   0.77 =      108171.91
C/C++ Header        147      7650     12093     44042 x   1.00 =       44042.00
Bourne Shell        116      3402      5789     36882 x   3.81 =      140520.42
Lisp                  1       684      2242      7515 x   1.25 =        9393.75
make                  7       498       473      2044 x   2.50 =        5110.00
C++                  10       312       277      2000 x   1.51 =        3020.00
XML                  26       231         0      1972 x   1.90 =        3746.80
yacc                  2       128        97      1549 x   1.51 =        2338.99
YAML                  2         2         0       489 x   0.90 =         440.10
DOS Batch            11        85        50       322 x   0.63 =         202.86
HTML                  1        19         2        98 x   1.90 =         186.20
-------------------------------------------------------------------------------
SUM:               2510    142085    156406    547174 x   2.84 =     1556285.03
-------------------------------------------------------------------------------

To run cloc on Windows computers one must first open up a command (aka DOS) window and invoke cloc.exe from the command line there.

Options ^

  
prompt> cloc

Usage: cloc [options] <file(s)/dir(s)> | <report files>

 Count physical lines of source code in the given files and/or
 recursively below the given directories.

 Input Options
   --extract-with=<cmd>      Use <cmd> to extract binary archive files (e.g.:
                             .tar.gz, .zip, .Z).  Use the literal '>FILE<' as 
                             a stand-in for the actual file(s) to be
                             extracted.  For example, to count lines of code
                             in the input files 
                                gcc-4.2.tar.gz  perl-5.8.8.tar.gz  
                             on Unix use  
                               --extract-with='gzip -dc >FILE< | tar xf -'
                             or, if you have GNU tar,
                               --extract-with='tar zxf >FILE<' 
                             and on Windows use: 
                               --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ."
                             (if WinZip is installed there).
   --list-file=<file>        Take the list of file and/or directory names to 
                             process from <file> which has one file/directory 
                             name per line.  See also --exclude-list-file.
   --unicode                 Check binary files to see if they contain Unicode
                             expanded ASCII text.  This causes performance to
                             drop noticably.

 Processing Options
   --by-file                 Report results for every source file encountered.
   --by-file-by-lang         Report results for every source file encountered
                             in addition to reporting by language.
   --force-lang=<lang>[,<ext>]
                             Process all files that have a <ext> extension 
                             with the counter for language <lang>.  For 
                             example, to count all .f files with the 
                             Fortran 90 counter (which expects files to 
                             end with .f90) instead of the default Fortran 77 
                             counter, use
                               --force-lang="Fortran 90",f
                             If <ext> is omitted, every file will be counted
                             with the <lang> counter.  This option can be 
                             specified multiple times (but that is only
                             useful when <ext> is given each time).  
                             See also --script-lang.
   --read-lang-def=<file>    Load from <file> the language processing filters.
                             (see also --write-lang-def) then use these filters
                             instead of the built-in filters.
   --script-lang=<lang>,<s>  Process all files that invoke <s> as a #!
                             scripting language with the counter for language
                             <lang>.  For example, files that begin with
                                #!/usr/local/bin/perl5.8.8
                             will be counted with the Perl counter by using
                                --script-lang=Perl,perl5.8.8
                             The language name is case insensitive but the
                             name of the script language executable, <s>,
                             must have the right case.  This option can be 
                             specified multiple times.  See also --force-lang.
   --sdir=<dir>              Use <dir> as the scratch directory instead of
                             letting File::Temp chose the location.  Files
                             written to this location are not removed at
                             the end of the run (as they are with File::Temp).
   --skip-uniqueness         Skip the file uniqueness check.  This will give
                             a performance boost at the expense of counting
                             files with identical contents multiple times
                             (if such duplicates exist).
   --strip-comments=<ext>    For each file processed, write to the current
                             directory a version of the file which has blank
                             lines and comments removed.  The name of each
                             stripped file is the original file name with 
                             .<ext> appended to it.  It is written to the
                             current directory unless --original-dir is on.
   --original-dir            [Only effective in combination with 
                             --strip-comments]  Write the stripped files 
                             to the same directory as the original files.
   --sum-reports             Input arguments are report files previously
                             created with the --report-file option.  Makes
                             a cumulative set of results containing the
                             sum of data from the individual report files.

 Filter Options
   --exclude-dir=<D1>[,D2,]  Exclude the given comma separated directories
                             D1, D2, D3, et cetera, from being scanned.  For
                             example  --exclude-dir=.cache,test  will skip
                             all files that have /.cache/ or /test/ as part
                             of their path.
                             Directories named .cvs and .svn are always
                             excluded.
   --exclude-lang=<L1>[,L2,] Exclude the given comma separated languages
                             L1, L2, L3, et cetera, from being counted.
   --exclude-list-file=<file>  Ignore files whose names appear in <file>.
                             <file> should have one entry per line.  Relative
                             path names will be resolved starting from the
                             directory where cloc is invoked.  See also
                             --list-file.
   --match-f=<regex>         Only count files whose basenames match the Perl 
                             regex.  For example
                               --match-f=^[Ww]idget
                             only counts files that start with Widget or widget.
   --not-match-f=<regex>     Count all files except those whose basenames
                             match the Perl regex.

 Debug Options
   --categorized=<file>      Save names of categorized files to <file>.
   --counted=<file>          Save names of processed source files to <file>.
   --help                    Print this usage information and exit.
   --found=<file>            Save names of every file found to <file>.
   --ignored=<file>          Save names of ignored files and the reason they
                             were ignored to <file>.
   --print-filter-stages     Print to STDOUT processed source code before and 
                             after each filter is applied.
   --show-ext[=<ext>]        Print information about all known (or just the
                             given) file extensions and exit.
   --show-lang[=<lang>]      Print information about all known (or just the
                             given) languages and exit.
   -v[=<n>]                  Verbose switch (optional numeric value).
   --version                 Print the version of this program and exit.

   --write-lang-def=<file>   Writes to <file> the language processing filters
                             then exits.  Useful as a first step to creating
                             custom language definitions (see --read-lang-def).

 Output Options
   --no3                     Suppress third-generation language output.
                             (This option can cause report summation to fail
                             if some reports were produced with this option
                             while others were produced without it.)
   --progress-rate=<n>       Show progress update after every <n> files are
                             processed (default <n>=100).  Set <n> to 0 to
                             suppress progress output (useful when redirecting
                             output to STDOUT).
   --quiet                   Suppress all information messages except for
                             the final report.
   --report-file=<file>      Write the results to <file> instead of STDOUT.
   --out=<file>              Synonym for --report-file=<file>.
   --csv                     Write the results as comma separated values.
   --xml                     Write the results in XML.
   --xsl[=<file>]            Reference <file> as an XSL stylesheet within
                             the XML output.  If <file> is not given, writes
                             a default stylesheet, cloc.xsl.  This switch
                             forces --xml to be on.
   --yaml                    Write the results in YAML.

Recognized Languages ^

prompt> cloc  --show-lang

ABAP                       (abap)
ActionScript               (as)
Ada                        (ada, adb, ads, pad)
ADSO/IDSM                  (adso)
AMPLE                      (ample, dofile, startup)
ASP                        (asa, asp)
ASP.Net                    (asax, ascx, asmx, aspx, config, master, sitemap, webinfo)
Assembly                   (asm, S, s)
awk                        (awk)
Bourne Again Shell         (bash)
Bourne Shell               (sh)
C                          (c, ec, pgc)
C Shell                    (csh, tcsh)
C#                         (cs)
C++                        (C, cc, cpp, cxx, pcc)
C/C++ Header               (H, h, hh, hpp)
CCS                        (ccs)
COBOL                      (cbl, CBL, cob, COB)
ColdFusion                 (cfm)
CSS                        (css)
D                          (d)
DAL                        (da)
DOS Batch                  (bat, BAT)
DTD                        (dtd)
Expect                     (exp)
Focus                      (focexec)
Fortran 77                 (F, f, f77, F77, pfo)
Fortran 90                 (F90, f90)
Fortran 95                 (F95, f95)
Haskell                    (hs, lhs)
HTML                       (htm, html)
IDL                        (idl)
inc                        (inc)
Java                       (java)
Javascript                 (js)
JCL                        (jcl)
JSP                        (jsp)
Korn Shell                 (ksh)
lex                        (l)
Lisp                       (cl, el, jl, lsp, sc, scm)
LiveLink OScript           (oscript)
Lua                        (lua)
m4                         (ac, m4)
make                       (am, gnumakefile, Gnumakefile, Makefile, makefile)
MATLAB                     (m)
ML                         (ml, mli)
Modula3                    (i3, ig, m3, mg)
MSBuild scripts            (csproj, wdproj)
MUMPS                      (mps, m)
NAnt scripts               (build)
NASTRAN DMAP               (dmap)
Objective C                (m)
Oracle Forms               (fmt)
Oracle Reports             (rex)
Pascal                     (dpr, p, pas, pp)
Patran Command Language    (pcl, ses)
Perl                       (perl, PL, pl, plh, plx, pm)
PHP                        (php, php3, php4, php5)
Python                     (py)
Rexx                       (rexx)
Ruby                       (rb)
sed                        (sed)
SKILL                      (il)
SKILL++                    (ils)
Softbridge Basic           (sbl, SBL)
SQL                        (psql, SQL, sql)
Tcl/Tk                     (itk, tcl, tk)
Teamcenter def             (def)
Teamcenter met             (met)
Teamcenter mth             (mth)
VHDL                       (vhd, VHD, VHDL, vhdl)
vim script                 (vim)
Visual Basic               (bas, cls, frm, vb, VB, vba, VBA, vbs, VBS)
XML                        (XML, xml)
XSD                        (xsd, XSD)
XSLT                       (xsl, XSL, xslt, XSLT)
yacc                       (y)
YAML                       (yaml, yml)

MATLAB, MUMPS, and Objective C are the only recognized languages which map to the same file extension, .m. cloc has a subroutine which attempts to identify the right language based on the file's contents.

The above list can be customized by reading language definitions from a file with the --read-lang-def option.

How it Works ^

cloc's method of operation resembles SLOCCount's: First, create a list of files to consider. Next, attempt to determine whether or not found files contain recognized computer language source code. Finally, for files identified as source files, invoke language-specific routines to count the number of source lines.

A more detailed description:

  1. If the input file is an archive (such as a .tar.gz or .zip file), create a temporary directory and expand the archive there using a system call to an appropriate underlying utility (tar, bzip2, unzip, etc) then add this temporary directory as one of the inputs. (This works more reliably on Unix than on Windows.)
  2. Use File::Find to recursively descend the input directories and make a list of candidate file names. Ignore binary and zero-sized files.
  3. Make sure the files in the candidate list have unique contents (first by comparing file sizes, then for, similarly sized files, compare MD5 hashes of the file contents with Digest::MD5).
  4. Scan the candidate file list for file extensions which cloc associates with programming languages (see the --show-lang and --show-ext options). Files which match are classified as containing source code for that language. Each file without an extensions is opened and its first line read to see if it is a Unix shell script (anything that begins with #!). If it is shell script, the file is classified by that scripting language (if the language is recognized). If the file does not have a recognized extension or is not a recognzied scripting language, the file is ignored.
  5. All remaining files in the candidate list should now be source files for known programming languages. For each of these files:
    1. Read the entire file into memory.
    2. Count the number of lines (= Loriginal).
    3. Remove blank lines, then count again (= Lnon_blank).
    4. Loop over the comment filters defined for this language. (For example, C++ has two filters: (1) remove lines that start with optional whitespace followed by // and (2) remove text between /* and */) Apply each filter to the code to remove comments. Count the left over lines (= Lcode).
    5. Save the counts for this language:
      blank lines = Loriginal - Lnon_blank
      comment lines = Loriginal - Lnon_blank - Lcode
      code lines = Lcode

The options modify the algorithm slightly. The --read-lang-def option for example allows the user to read definitions of comment filters, known file extensions, and known scripting languages from a file. The code for this option is processed between Steps 2 and 3.

Advanced Use ^

Remove Comments from Source Code^

How can you tell if cloc correctly identifies comments? One way to convince yourself cloc is doing the right thing is to use its --strip-comments option to remove comments and blank lines from files, then compare the stripped-down files to originals.

Let's try this out with the SQLite amalgamation, a C file containing all code needed to build the SQLite library along with a header file:
prompt> tar zxf sqlite-amalgamation-3.5.6.tar.gz 
prompt> cd sqlite-3.5.6/
prompt> cloc --strip-comments=nc sqlite.c
       1 text file.
       1 unique file.                              
Wrote sqlite3.c.nc
       0 files ignored.

http://cloc.sourceforge.net v 1.03  T=1.0 s (1.0 files/s, 82895.0 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                     1      5167     26827     50901 x   0.77 =       39193.77
-------------------------------------------------------------------------------

The extention argument given to --strip-comments is arbitrary; here nc was used as an abbreviation for "no comments".

cloc removed over 31,000 lines from the file:

prompt> wc -l sqlite3.c sqlite3.c.nc 
  82895 sqlite3.c
  50901 sqlite3.c.nc
 133796 total
prompt> echo "82895 - 50901" | bc
31994

We can now compare the orignial file, sqlite3.c and the one stripped of comments, sqlite3.c.nc with tools like diff or vimdiff and see what exactly cloc considered comments and blank lines. A rigorous proof that the stripped-down file contains the same C code as the original is to compile these files and compare checksums of the resulting object files.

First, the original source file:

prompt> gcc -c sqlite3.c
prompt> md5sum sqlite3.o
cce5f1a2ea27c7e44b2e1047e2588b49  sqlite3.o

Next, the version without comments:

prompt> mv sqlite3.c.nc sqlite3.c
prompt> gcc -c sqlite3.c
prompt> md5sum sqlite3.o
cce5f1a2ea27c7e44b2e1047e2588b49  sqlite3.o
cloc removed over 31,000 lines of comments and blanks but did not modify the source code in any significant way since the resulting object file matches the original.

Work with Compressed Archives ^

Versions of cloc before v1.07 required an --extract-with=<cmd> option to tell cloc how to expand an archive file. Beginning with v1.07 this is extraction is attempted automatically. At the moment the automatic extraction method works reasonably well on Unix-type OS's for the following file types: .tar.gz, .tar.bz2, .tgz, .zip, .ear. Some of these extensions work on Windows if one has WinZip installed in the default location (C:\Program Files\WinZip\WinZip32.exe). Support is planned for .src.rpm files in a future release.

In situations where the automatic extraction fails one can try the --extract-with=<cmd> option allows one to count lines of code within tar files, Zip files, or other compressed archives for which one has an extraction tool. cloc takes the user-provided extraction command and expands the archive to a temporary directory (created with File::Temp), counts the lines of code in the temporary directory, then removes that directory. While not especially helpful when dealing with a single compressed archive (after all, if you're going to type the extraction command anyway why not just manually expand the archive?) this option is handy for working with several archives at once.

For example, say you have the following source tarballs on a Unix machine
     perl-5.8.5.tar.gz
     Python-2.4.2.tar.gz
and you want to count all the code within them. The command would be

cloc --extract-with='gzip -dc >FILE< | tar xf -' perl-5.8.5.tar.gz Python-2.4.2.tar.gz
If that Unix machine has GNU tar (which can uncompress and extract in one step) the command can be shortened to
cloc --extract-with='tar zxf >FILE<' perl-5.8.5.tar.gz Python-2.4.2.tar.gz
On a Windows computer with WinZip installed in c:\Program Files\WinZip the command would look like
cloc.exe --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ." perl-5.8.5.tar.gz Python-2.4.2.tar.gz
Java .ear files are Zip files that contain additional Zip files. cloc can handle nested compressed archives without difficulty--provided all such files are compressed and archived in the same way. Examples of counting a Java .ear file in Unix and Windows:
Unix> cloc --extract-with="unzip -d . >FILE< " Project.ear

DOS> cloc.exe --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ." Project.ear

Create Custom Language Definitions ^

cloc can write its language comment definitions to a file or can read comment definitions from a file, overriding the built-in definitions. This can be useful when you want to use cloc to count lines of a language not yet included, to change association of file extensions to languages, or to modify the way existing languages are counted.

The easiest way to create a custom language definition file is to make cloc write its definitions to a file, then modify that file:

Unix> cloc --write-lang-def=my_definitions.txt
creates the file my_definitions.txt which can be modified then read back in with
Unix> cloc --read-lang-def=my_definitions.txt  file1 file2 dir1 ...

Each language entry has four parts:

  1. The language name starting in column 1.
  2. One or more comment filters starting in column 5.
  3. One or more filename extensions starting in column 5.
  4. A 3rd generation scale factor starting in column 5. This entry must be provided but its value is not important unless you want to compare your language to a hypothetical third generation programming language.
A filter defines a method to remove comment text from the source file. For example the entry for C++ looks like this
C++
    filter remove_matches ^\s*//
    filter call_regexp_common C
    extension C
    extension cc
    extension cpp
    extension cxx
    extension pcc
    3rd_gen_scale 1.51
C++ has two filters: first, remove lines that start with optional whitespace and are followed by //. Next, remove all C comments. C comments are difficult to express as regular expressions so a call is made to Regexp::Common to get the appropriate regular expression to match C comments which are then removed.

A more complete discussion of the different filter options may appear here in the future. The output of cloc's --write-lang-def option should provide enough examples for motivated individuals to modify or extend cloc's language definitions.

Combine Reports ^

If you manage multiple software projects you might be interested in seeing line counts by project, not just by language. Say you manage three software projects called MySQL, PostgreSQL, and SQLite. The teams responsible for each of these projects run cloc on their source code and provide you with the output. For example MySQL team does

cloc --report-file=mysql-5.0.24a.txt --extract-with='tar zxf >FILE<' mysql-5.0.24a.tar.gz
and provides you with the file mysql-5.0.24a.txt. The contents of the three files you get are
Unix> cat mysql-5.0.24a.txt
http://cloc.sourceforge.net v 0.72  T=300.0 s (10.3 files/s, 5785.6 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C++                 636     87248    108470    536619 x   1.51 =      810294.69
C                   790     74025     85474    412662 x   0.77 =      317749.74
C/C++ Header        924     26969     53128    122434 x   1.00 =      122434.00
Bourne Shell        212     16081     18048    113940 x   3.81 =      434111.40
Tcl/Tk              235      5276      7497     30484 x   1.25 =       38105.00
Perl                 31      1731      1512      7931 x   4.00 =       31724.00
Java                131      1374      1358      7686 x   1.36 =       10452.96
XML                  25       540        22      3914 x   1.90 =        7436.60
SQL                   8       173        56      2673 x   2.29 =        6121.17
HTML                 13       244        22      2097 x   1.90 =        3984.30
awk                  13       176       337      1967 x   3.81 =        7494.27
Assembler            14       169         0      1357 x   0.25 =         339.25
sed                   1         0         0       772 x   4.00 =        3088.00
Teamcenter def       30        90       117       722 x   1.00 =         722.00
Make                 10        40        19       203 x   2.50 =         507.50
DOS Batch             3        12         3        17 x   0.63 =          10.71
-------------------------------------------------------------------------------
SUM:               3076    214148    276063   1245478 x   1.44 =     1794575.59
-------------------------------------------------------------------------------

Unix> cat sqlite-3.3.7.txt
http://cloc.sourceforge.net v 0.72  T=49.0 s (3.0 files/s, 2733.8 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                    65      4603     20237     49674 x   0.77 =       38248.98
Bourne Shell          8      3050      4218     24223 x   3.81 =       92289.63
Tcl/Tk               51      2609       911     18017 x   1.25 =       22521.25
C/C++ Header         10       234      1402      2194 x   1.00 =        2194.00
yacc                  1       108        41       933 x   1.51 =        1408.83
HTML                  2       128         0       873 x   1.90 =        1658.70
awk                   6         6        82       180 x   3.81 =         685.80
Teamcenter def        1         0         0       101 x   1.00 =         101.00
Make                  1        19        89        22 x   2.50 =          55.00
-------------------------------------------------------------------------------
SUM:                145     10757     26980     96217 x   1.65 =      159163.19
-------------------------------------------------------------------------------

Unix> cat postgresql-8.1.4.txt
http://cloc.sourceforge.net v 0.72  T=211.0 s (11.8 files/s, 5676.6 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
HTML                693      4703        19    412348 x   1.90 =      783461.20
C                   743     75657    126646    407089 x   0.77 =      313458.53
C/C++ Header        476      7822     19293     32771 x   1.00 =       32771.00
Bourne Shell         48      2933      2897     27396 x   3.81 =      104378.76
SQL                 185      5564      4216     17864 x   2.29 =       40908.56
lex                 118       978      1346     15799 x   1.00 =       15799.00
yacc                  6      1958      2399     14178 x   1.51 =       21408.78
Perl                 30      1262       883      4356 x   4.00 =       17424.00
Make                172      1425      1349      3678 x   2.50 =        9195.00
Teamcenter def        4         1         0       525 x   1.00 =         525.00
XSL                   2        49        30       137 x   1.90 =         260.30
Assembler             3         9         0       102 x   0.25 =          25.50
awk                   1         3        30        20 x   3.81 =          76.20
Python                1         5         1        12 x   4.20 =          50.40
-------------------------------------------------------------------------------
SUM:               2482    102369    159109    936275 x   1.43 =     1339742.23
-------------------------------------------------------------------------------

While these three files are interesting, you also want to see the combined counts from all projects. That can be done with cloc's --sum_reports option:

Unix> cloc --sum-reports --report_file=databases mysql-5.0.24a.txt sqlite-3.3.7.txt postgresql-8.1.4.txt
Wrote databases.lang
Wrote databases.file
The report combination produces two output files, one for sums by programming language (databases.lang) and one by project (databases.file). Their contents are
Unix> cat databases.lang
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                  1598    154285    232357    869425 x   0.77 =      669457.25
C++                 636     87248    108470    536619 x   1.51 =      810294.69
HTML                708      5075        41    415318 x   1.90 =      789104.20
Bourne Shell        268     22064     25163    165559 x   3.81 =      630779.79
C/C++ Header       1410     35025     73823    157399 x   1.00 =      157399.00
Tcl/Tk              286      7885      8408     48501 x   1.25 =       60626.25
SQL                 193      5737      4272     20537 x   2.29 =       47029.73
lex                 118       978      1346     15799 x   1.00 =       15799.00
yacc                  7      2066      2440     15111 x   1.51 =       22817.61
Perl                 61      2993      2395     12287 x   4.00 =       49148.00
Java                131      1374      1358      7686 x   1.36 =       10452.96
XML                  25       540        22      3914 x   1.90 =        7436.60
Make                183      1484      1457      3903 x   2.50 =        9757.50
awk                  20       185       449      2167 x   3.81 =        8256.27
Assembler            17       178         0      1459 x   0.25 =         364.75
Teamcenter def       35        91       117      1348 x   1.00 =        1348.00
sed                   1         0         0       772 x   4.00 =        3088.00
XSL                   2        49        30       137 x   1.90 =         260.30
DOS Batch             3        12         3        17 x   0.63 =          10.71
Python                1         5         1        12 x   4.20 =          50.40
-------------------------------------------------------------------------------
SUM:               5703    327274    462152   2277970 x   1.45 =     3293481.01
-------------------------------------------------------------------------------

Unix> cat databases.file
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Report File          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
mysql-5.0.24a.txt     3076    214148    276063   1245478 x   1.44 =     1794575.59
postgresql-8.1.4.txt  2482    102369    159109    936275 x   1.43 =     1339742.23
sqlite-3.3.7.txt       145     10757     26980     96217 x   1.65 =      159163.19
-------------------------------------------------------------------------------
SUM:                  5703    327274    462152   2277970 x   1.45 =     3293481.01
-------------------------------------------------------------------------------

Report files themselves can be summed together. Say you also manage development of Perl and Python and you want to keep track of those line counts separately from your database projects. First create reports for Perl and Python separately:

cloc --report-file=perl-5.8.8.txt   --extract-with='tar zxf >FILE<' perl-5.8.8.tar.gz
cloc --report-file=python-2.4.2.txt --extract-with='tar jxf >FILE<' Python-2.4.2.tar.bz2
then sum these together with
Unix> cloc --sum-reports --report_file=script_lang perl-5.8.8.txt python-2.4.2.txt
Wrote script_lang.lang
Wrote script_lang.file

Unix> cat script_lang.lang
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                   409     46920     35958    383652 x   0.77 =      295412.04
Python             1605     55998     31886    309549 x   4.20 =     1300105.80
Perl               1576     74568     89136    220919 x   4.00 =      883676.00
C/C++ Header        280     12169     26366     88089 x   1.00 =       88089.00
Bourne Shell        146      5201      7428     52115 x   3.81 =      198558.15
Lisp                  4      1120      2291      9799 x   1.25 =       12248.75
Make                 17      1092       939      5348 x   2.50 =       13370.00
Teamcenter def       10       144        88      3163 x   1.00 =        3163.00
HTML                 22       516         2      2769 x   1.90 =        5261.10
yacc                  2       125        72      1047 x   1.51 =        1580.97
XML                   2       103        32       894 x   1.90 =        1698.60
Objective C           6       102        19       704 x   2.96 =        2083.84
C++                   4       104       215       451 x   1.51 =         681.01
DOS Batch            14        93        73       387 x   0.63 =         243.81
Expect                1         0         0        60 x   2.00 =         120.00
Java                  2         6         1        23 x   1.36 =          31.28
sed                   1         0         1         2 x   4.00 =           8.00
-------------------------------------------------------------------------------
SUM:               4101    198261    194507   1078971 x   2.60 =     2806331.35
-------------------------------------------------------------------------------

Unix> cat script_lang.file
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Report File       files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
python-2.4.2.txt   2149     96618     60365    651118 x   2.54 =     1656782.96
perl-5.8.8.txt     1952    101643    134142    427853 x   2.69 =     1149548.39
-------------------------------------------------------------------------------
SUM:               4101    198261    194507   1078971 x   2.60 =     2806331.35
-------------------------------------------------------------------------------

Finally, combine the combination files:
Unix> cloc --sum-reports --report_file=everything databases.lang script_lang.lang
Wrote everything.lang
Wrote everything.file

Unix> cat everything.lang
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                  2007    201205    268315   1253077 x   0.77 =      964869.29
C++                 640     87352    108685    537070 x   1.51 =      810975.70
HTML                730      5591        43    418087 x   1.90 =      794365.30
Python             1606     56003     31887    309561 x   4.20 =     1300156.20
C/C++ Header       1690     47194    100189    245488 x   1.00 =      245488.00
Perl               1637     77561     91531    233206 x   4.00 =      932824.00
Bourne Shell        414     27265     32591    217674 x   3.81 =      829337.94
Tcl/Tk              286      7885      8408     48501 x   1.25 =       60626.25
SQL                 193      5737      4272     20537 x   2.29 =       47029.73
yacc                  9      2191      2512     16158 x   1.51 =       24398.58
lex                 118       978      1346     15799 x   1.00 =       15799.00
Lisp                  4      1120      2291      9799 x   1.25 =       12248.75
Make                200      2576      2396      9251 x   2.50 =       23127.50
Java                133      1380      1359      7709 x   1.36 =       10484.24
XML                  27       643        54      4808 x   1.90 =        9135.20
Teamcenter def       45       235       205      4511 x   1.00 =        4511.00
awk                  20       185       449      2167 x   3.81 =        8256.27
Assembler            17       178         0      1459 x   0.25 =         364.75
sed                   2         0         1       774 x   4.00 =        3096.00
Objective C           6       102        19       704 x   2.96 =        2083.84
DOS Batch            17       105        76       404 x   0.63 =         254.52
XSL                   2        49        30       137 x   1.90 =         260.30
Expect                1         0         0        60 x   2.00 =         120.00
-------------------------------------------------------------------------------
SUM:               9804    525535    656659   3356941 x   1.82 =     6099812.36
-------------------------------------------------------------------------------

Unix> cat everything.file
http://cloc.sourceforge.net v 0.72
-------------------------------------------------------------------------------
Report File       files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
databases.lang     5703    327274    462152   2277970 x   1.45 =     3293481.01
script_lang.lang   4101    198261    194507   1078971 x   2.60 =     2806331.35
-------------------------------------------------------------------------------
SUM:               9804    525535    656659   3356941 x   1.82 =     6099812.36
-------------------------------------------------------------------------------

Suppress Third Generation Language Output^

The last two columns of cloc's output, "scale" and "3rd. gen. equiv." are rough indications of how many lines of code would be needed by a hypothetical third-generation computer language. The values in these columns should be taken with a large grain of salt. They can be suppressed entirely with the --no3 option to produce cleaner output. Here's what the output looks like for the same Perl 5.8.8 count shown above:

prompt> cloc --no3 --extract-with='tar zxf >FILE<' perl-5.8.8.tar.gz
tar zxf perl-5.8.8.tar.gz
    3106 text files.
    2975 unique files.
    1132 files ignored.

http://cloc.sourceforge.net v 0.90  T=70.0 s (27.9 files/s, 9480.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Perl                          1564          73960          88294         217162
C                              115          14872          17107         120583
C/C++ Header                   132           8426          21237          45229
Bourne Shell                   111           2987           5346          32954
Lisp                             1            583           1772           6121
Make                             8            479            459           2113
Teamcenter def                   2              0              0           1345
yacc                             2            125             72           1047
C++                              3            101            214            444
DOS Batch                       11             85             50            322
HTML                             1             19              2             98
Java                             2              6              1             23
-------------------------------------------------------------------------------
SUM:                          1952         101643         134554         427441
-------------------------------------------------------------------------------

If you use the report summation feature, make sure all inputs were produced the same way, either all with the --no3 option or all without.

Limitations ^

Identifying comments within source code is trickier than one might expect. Many languages would need a complete parser to be counted correctly. cloc does not attempt to parse any of the languages it aims to count and therefore is an imperfect tool. The following are known problems:

  1. Lines containing both source code and comments are counted as lines of code.
  2. Comment markers within strings or here-documents are treated as actual comment markers and not string literals. For example the following lines of C code
    printf(" /* ");
    for (i = 0; i < 100; i++) {
        a += i;
    }
    printf(" */ ");
    
    appear to cloc as two lines of C code (parts of the two printf() lines) and three lines of comments (the entire for loop).
  3. Lua long comments are not recognized.

Author ^

Al Danial

Acknowledgements ^

Wolfram Rösler provided most of the code examples in the test suite. These examples come from his Hello World Collection.

Ismet Kursunoglu found errors with the MUMPS counter and provided access to a computer with a large body of MUMPS code to test cloc.

Tod Huggins gave helpful suggestions for the Visual Basic filters.

Anton Demichev found a flaw with the JSP counter in cloc v0.76 and wrote the XML ouput generator for the --xml option.

Reuben Thomas pointed out that ISO C99 allows // as a comment marker, provided code for the --no3 option and for counting the m4 language, and suggested several user-interface enhancements.

Michael Bello provided code for the --opt-match-f and --opt-not-match-f options.

Mahboob Hussain inspired the --original-dir and --skip-uniqueness options, found a bug in the duplicate file detection logic and improved the JSP filter.

Randy Sharo found and fixed an uninitialized variable bug for shell scripts having only one line.

The development of cloc was partially funded by the Northrop Grumman Corporation.

Copyright ^

Copyright (c) 2006-2009, Northrop Grumman Corporation / Information Technology / IT Solutions

License ^

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.


Get cloc at SourceForge.net. Fast, secure and Free Open Source software downloads