CLOC
Count Lines of Code

Overview ^

[Translations: Serbo-Croatian]

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. Given two versions of a code base, cloc can compute differences in blank, comment, and source lines. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable. cloc is known to run on many flavors of Linux, FreeBSD, NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl 5.6.1 or higher, Strawberry Perl, Cygwin, or MobaXTerm with the Perl plug-in installed. Alternatively one can use the Windows binary of cloc generated with perl2exe to run on Windows computers that have neither Perl nor Cygwin.)

cloc contains code from David Wheeler's SLOCCount, Damian Conway and Abigail's Perl module Regexp::Common, Sean M. Burke's Perl module Win32::Autoglob, and Tye McQueen's Perl module Algorithm::Diff. Language scale factors were derived from Mayes Consulting, LLC web site http://softwareestimator.com/IndustryData2.htm.

License^

cloc is licensed under the GNU General Public License, v2 , excluding portions which are copied from other sources. Code copied from the Regexp::Common, Win32::Autoglob, and Algorithm::Diff Perl modules is subject to the Artistic License.

Why Use cloc? ^

cloc has many features that make it easy to use, thorough, extensible, and portable:

  1. Exists as a single, self-contained file that requires minimal installation effort---just download the file and run it.
  2. Can read language comment definitions from a file and thus potentially work with computer languages that do not yet exist.
  3. Allows results from multiple runs to be summed together by language and by project.
  4. Can produce results in a variety of formats: plain text, SQL, XML, YAML, comma separated values.
  5. Can count code within compressed archives (tar balls, Zip files, Java .ear files).
  6. Has numerous troubleshooting options.
  7. Handles file and directory names with spaces and other unusual characters.
  8. Has no dependencies outside the standard Perl distribution.
  9. Runs on Linux, FreeBSD, NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, and z/OS systems that have Perl 5.6 or higher. The source version runs on Windows with either ActiveState Perl, Strawberry Perl, Cygwin, or MobaXTerm+Perl plugin. Alternatively on Windows one can run the Windows binary which has no dependencies.

Other Counters ^

If cloc does not suit your needs here are other freely available counters to consider:

Other references:

Regexp::Common, Digest::MD5, Win32::Autoglob, Algorithm::Diff

Although cloc does not need Perl modules outside those found in the standard distribution, cloc does rely on a few external modules. Code from three of these external modules--Regexp::Common, Win32::Autoglob, and Algorithm::Diff--is embedded within cloc. A fourth module, Digest::MD5, is used only if it is available. If cloc finds Regexp::Common or Algorithm::Diff installed locally it will use those installation. If it doesn't, cloc will install the parts of Regexp::Common and/or Algorithm:Diff it needs to temporary directories that are created at the start of a cloc run then removed when the run is complete. The necessary code from Regexp::Common v2.120 and Algorithm::Diff v1.1902 are embedded within the cloc source code (see subroutines Install_Regexp_Common() and Install_Algorithm_Diff() ). Only three lines are needed from Win32::Autoglob and these are included directly in cloc.

Additionally, cloc will use Digest::MD5 to validate uniqueness among input files if Digest::MD5 is installed locally. If Digest::MD5 is not found the file uniqueness check is skipped.

The Windows binary is built on a computer that has both Regexp::Common and Digest::MD5 installed locally.

Building a Windows Executable ^

The SourceForge download area contains two Windows executables built from the Perl source code. The default Windows download was built with perl2exe on a 32 bit Windows XP computer. A small modification was made to the cloc source code before passing it to perl2exe; lines 87 and 88 were uncommented:

85  # Uncomment next two lines when building Windows executable with perl2exe
86  # or if running on a system that already has Regexp::Common. 
87  #use Regexp::Common;
88  #$HAVE_Rexexp_Common = 1;

An alternative to creating a Windows executable with perl2exe is the free Strawberry Perl distribution plus the PAR::Packer module. The steps are to first install Strawberry Perl following their instructions. Next, open a command prompt, aka a DOS window and install the PAR::Packer module. Finally, invoke the newly installed pp command with the cloc souce code to create an .exe file:

C:> perl -MCPAN -e shell
cpan> install PAR::Packer
cpan> exit
C:> pp cloc-1.60.pl
A variation on the above is if you installed the portable version of Strawberry Perl, you will need to run portableshell.bat first to properly set up your environment. The Strawberry Perl derived executable on the SourceForge download area was created with the portable version on a 32 bit Windows XP computer.

Basic Use ^

cloc is a command line program that takes file, directory, and/or archive names as inputs. Here's an example of running cloc against the Perl v5.10.0 source distribution:

  
prompt> cloc perl-5.10.0.tar.gz
    4076 text files.
    3883 unique files.                                          
    1521 files ignored.

http://cloc.sourceforge.net v 1.50  T=12.0 s (209.2 files/s, 70472.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Perl                          2052         110356         130018         292281
C                              135          18718          22862         140483
C/C++ Header                   147           7650          12093          44042
Bourne Shell                   116           3402           5789          36882
Lisp                             1            684           2242           7515
make                             7            498            473           2044
C++                             10            312            277           2000
XML                             26            231              0           1972
yacc                             2            128             97           1549
YAML                             2              2              0            489
DOS Batch                       11             85             50            322
HTML                             1             19              2             98
-------------------------------------------------------------------------------
SUM:                          2510         142085         173903         529677
-------------------------------------------------------------------------------

To run cloc on Windows computers, one must first open up a command (aka DOS) window and invoke cloc.exe from the command line there.

Options ^

  
prompt> cloc

Usage: cloc [options] <file(s)/dir(s)> | <set 1> <set 2> | <report files>

 Count, or compute differences of, physical lines of source code in the
 given files (may be archives such as compressed tarballs or zip files)
 and/or recursively below the given directories.

 Input Options
   --extract-with=<cmd>      This option is only needed if cloc is unable
                             to figure out how to extract the contents of
                             the input file(s) by itself.
                             Use <cmd> to extract binary archive files (e.g.:
                             .tar.gz, .zip, .Z).  Use the literal '>FILE<' as
                             a stand-in for the actual file(s) to be
                             extracted.  For example, to count lines of code
                             in the input files
                                gcc-4.2.tar.gz  perl-5.8.8.tar.gz
                             on Unix use
                               --extract-with='gzip -dc >FILE< | tar xf -'
                             or, if you have GNU tar,
                               --extract-with='tar zxf >FILE<'
                             and on Windows use, for example:
                               --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ."
                             (if WinZip is installed there).
   --list-file=<file>        Take the list of file and/or directory names to
                             process from <file> which has one file/directory
                             name per line.  See also --exclude-list-file.
   --unicode                 Check binary files to see if they contain Unicode
                             expanded ASCII text.  This causes performance to
                             drop noticably.

 Processing Options
   --autoconf                Count .in files (as processed by GNU autoconf) of
                             recognized languages.
   --by-file                 Report results for every source file encountered.
   --by-file-by-lang         Report results for every source file encountered
                             in addition to reporting by language.
   --diff <set1> <set2>      Compute differences in code and comments between
                             source file(s) of <set1> and <set2>.  The inputs
                             may be pairs of files, directories, or archives.
                             Use --diff-alignment to generate a list showing
                             which file pairs where compared.  See also
                             --ignore-case, --ignore-whitespace.
   --diff-timeout <N>        Ignore files which take more than <N> seconds
                             to process.  Default is 10 seconds.
                             (Large files with many repeated lines can cause 
                             Algorithm::Diff::sdiff() to take hours.)
   --follow-links            [Unix only] Follow symbolic links to directories
                             (sym links to files are always followed).
   --force-lang=<lang>[,<ext>]
                             Process all files that have a <ext> extension
                             with the counter for language <lang>.  For
                             example, to count all .f files with the
                             Fortran 90 counter (which expects files to
                             end with .f90) instead of the default Fortran 77
                             counter, use
                               --force-lang="Fortran 90",f
                             If <ext> is omitted, every file will be counted
                             with the <lang> counter.  This option can be
                             specified multiple times (but that is only
                             useful when <ext> is given each time).
                             See also --script-lang, --lang-no-ext.
   --force-lang-def=<file>   Load language processing filters from <file>,
                             then use these filters instead of the built-in
                             filters.  Note:  languages which map to the same 
                             file extension (for example:
                             MATLAB/Objective C/MUMPS;  Pascal/PHP; 
                             Lisp/OpenCL) will be ignored as these require 
                             additional processing that is not expressed in 
                             language definition files.  Use --read-lang-def 
                             to define new language filters without replacing 
                             built-in filters (see also --write-lang-def).
   --ignore-whitespace       Ignore horizontal white space when comparing files
                             with --diff.  See also --ignore-case.
   --ignore-case             Ignore changes in case; consider upper- and lower-
                             case letters equivalent when comparing files with
                             --diff.  See also --ignore-whitespace.
   --lang-no-ext=<lang>      Count files without extensions using the <lang>
                             counter.  This option overrides internal logic
                             for files without extensions (where such files
                             are checked against known scripting languages
                             by examining the first line for #!).  See also
                             --force-lang, --script-lang.
   --max-file-size=<MB>      Skip files larger than <MB> megabytes when
                             traversing directories.  By default, <MB>=100.
                             cloc's memory requirement is roughly twenty times 
                             larger than the largest file so running with 
                             files larger than 100 MB on a computer with less 
                             than 2 GB of memory will cause problems.  
                             Note:  this check does not apply to files 
                             explicitly passed as command line arguments.
   --read-binary-files       Process binary files in addition to text files.
                             This is usually a bad idea and should only be
                             attempted with text files that have embedded
                             binary data.
   --read-lang-def=<file>    Load new language processing filters from <file>
                             and merge them with those already known to cloc.  
                             If <file> defines a language cloc already knows 
                             about, cloc's definition will take precedence.  
                             Use --force-lang-def to over-ride cloc's 
                             definitions (see also --write-lang-def ).
   --script-lang=<lang>,<s>  Process all files that invoke <s> as a #!
                             scripting language with the counter for language
                             <lang>.  For example, files that begin with
                                #!/usr/local/bin/perl5.8.8
                             will be counted with the Perl counter by using
                                --script-lang=Perl,perl5.8.8
                             The language name is case insensitive but the
                             name of the script language executable, <s>,
                             must have the right case.  This option can be
                             specified multiple times.  See also --force-lang,
                             --lang-no-ext.
   --sdir=<dir>              Use <dir> as the scratch directory instead of
                             letting File::Temp chose the location.  Files
                             written to this location are not removed at
                             the end of the run (as they are with File::Temp).
   --skip-uniqueness         Skip the file uniqueness check.  This will give
                             a performance boost at the expense of counting
                             files with identical contents multiple times
                             (if such duplicates exist).
   --stdin-name=<file>       Give a file name to use to determine the language
                             for standard input.
   --strip-comments=<ext>    For each file processed, write to the current
                             directory a version of the file which has blank
                             lines and comments removed.  The name of each
                             stripped file is the original file name with
                             .<ext> appended to it.  It is written to the
                             current directory unless --original-dir is on.
   --original-dir            [Only effective in combination with
                             --strip-comments]  Write the stripped files
                             to the same directory as the original files.
   --sum-reports             Input arguments are report files previously
                             created with the --report-file option.  Makes
                             a cumulative set of results containing the
                             sum of data from the individual report files.
   --unix                    Override the operating system autodetection
                             logic and run in UNIX mode.  See also
                             --windows, --show-os.
   --windows                 Override the operating system autodetection
                             logic and run in Microsoft Windows mode.
                             See also --unix, --show-os.

 Filter Options
   --exclude-dir=<D1>[,D2,]  Exclude the given comma separated directories
                             D1, D2, D3, et cetera, from being scanned.  For
                             example  --exclude-dir=.cache,test  will skip
                             all files that have /.cache/ or /test/ as part
                             of their path.
                             Directories named .bzr, .cvs, .hg, .git, and
                             .svn are always excluded.
   --exclude-ext=<ext1>[,<ext2>[...]]
                             Do not count files having the given file name
                             extensions.
   --exclude-lang=<L1>[,L2,] Exclude the given comma separated languages
                             L1, L2, L3, et cetera, from being counted.
   --exclude-list-file=<file>  Ignore files and/or directories whose names
                             appear in <file>.  <file> should have one entry
                             per line.  Relative path names will be resolved
                             starting from the directory where cloc is
                             invoked.  See also --list-file.
   --match-d=<regex>         Only count files in directories matching the Perl
                             regex.  For example
                               --match-d='/(src|include)/'
                             only counts files in directories containing
                             /src/ or /include/.
   --not-match-d=<regex>     Count all files except those in directories
                             matching the Perl regex.
   --match-f=<regex>         Only count files whose basenames match the Perl
                             regex.  For example
                               --match-f='^[Ww]idget'
                             only counts files that start with Widget or widget.
   --not-match-f=<regex>     Count all files except those whose basenames
                             match the Perl regex.
   --skip-archive=<regex>    Ignore files that end with the given Perl regular
                             expression.  For example, if given
                               --skip-archive='(zip|tar(.(gz|Z|bz2|xz|7z))?)'
                             the code will skip files that end with .zip,
                             .tar, .tar.gz, .tar.Z, .tar.bz2, .tar.xz, and
                             .tar.7z.
   --skip-win-hidden         On Windows, ignore hidden files.

 Debug Options
   --categorized=<file>      Save names of categorized files to <file>.
   --counted=<file>          Save names of processed source files to <file>.
   --diff-alignment=<file>   Write to <file> a list of files and file pairs
                             showing which files were added, removed, and/or
                             compared during a run with --diff.  This switch
                             forces the --diff mode on.
   --help                    Print this usage information and exit.
   --found=<file>            Save names of every file found to <file>.
   --ignored=<file>          Save names of ignored files and the reason they
                             were ignored to <file>.
   --print-filter-stages     Print to STDOUT processed source code before and
                             after each filter is applied.
   --show-ext[=<ext>]        Print information about all known (or just the
                             given) file extensions and exit.
   --show-lang[=<lang>]      Print information about all known (or just the
                             given) languages and exit.
   --show-os                 Print the value of the operating system mode
                             and exit.  See also --unix, --windows.
   -v[=<n>]                  Verbose switch (optional numeric value).
   --version                 Print the version of this program and exit.
   --write-lang-def=<file>   Writes to <file> the language processing filters
                             then exits.  Useful as a first step to creating
                             custom language definitions (see also
                             --force-lang-def, --read-lang-def).

 Output Options
   --3                       Print third-generation language output.
                             (This option can cause report summation to fail
                             if some reports were produced with this option
                             while others were produced without it.)
   --progress-rate=<n>       Show progress update after every <n> files are
                             processed (default <n>=100).  Set <n> to 0 to
                             suppress progress output (useful when redirecting
                             output to STDOUT).
   --quiet                   Suppress all information messages except for
                             the final report.
   --report-file=<file>      Write the results to <file> instead of STDOUT.
   --out=<file>              Synonym for --report-file=<file>.
   --csv                     Write the results as comma separated values.
   --csv-delimiter=<C>       Use the character <C> as the delimiter for comma
                             separated files instead of ,.  This switch forces
                             --csv to be on.
   --sql=<file>              Write results as SQL create and insert statements
                             which can be read by a database program such as
                             SQLite.  If <file> is -, output is sent to STDOUT.
   --sql-project=<name>      Use <name> as the project identifier for the
                             current run.  Only valid with the --sql option.
   --sql-append              Append SQL insert statements to the file specified
                             by --sql and do not generate table creation
                             statements.  Only valid with the --sql option.
   --sum-one                 For plain text reports, show the SUM: output line
                             even if only one input file is processed.
   --xml                     Write the results in XML.
   --xsl=<file>              Reference <file> as an XSL stylesheet within
                             the XML output.  If <file> is 1 (numeric one),
                             writes a default stylesheet, cloc.xsl (or
                             cloc-diff.xsl if --diff is also given).
                             This switch forces --xml on.
   --yaml                    Write the results in YAML.

Recognized Languages ^

prompt> cloc --show-lang

ABAP                       (abap)
ActionScript               (as)
Ada                        (ada, adb, ads, pad)
ADSO/IDSM                  (adso)
AMPLE                      (ample, dofile, startup)
Ant                        (build.xml)
Apex Trigger               (trigger)
Arduino Sketch             (ino, pde)
ASP                        (asa, asp)
ASP.Net                    (asax, ascx, asmx, aspx, config, master, sitemap, webinfo)
Assembly                   (asm, S, s)
AutoHotkey                 (ahk)
awk                        (awk)
Bourne Again Shell         (bash)
Bourne Shell               (sh)
C                          (c, ec, pgc)
C Shell                    (csh, tcsh)
C#                         (cs)
C++                        (C, cc, cpp, cxx, pcc)
C/C++ Header               (H, h, hh, hpp)
CCS                        (ccs)
Clojure                    (clj)
ClojureScript              (cljs)
CMake                      (cmake, CMakeLists.txt)
COBOL                      (cbl, CBL, cob, COB)
CoffeeScript               (coffee)
ColdFusion                 (cfm)
ColdFusion CFScript        (cfc)
CSS                        (css)
Cython                     (pyx)
D                          (d)
DAL                        (da)
Dart                       (dart)
DOS Batch                  (bat, BAT)
DTD                        (dtd)
Erlang                     (erl, hrl)
Expect                     (exp)
Focus                      (focexec)
Fortran 77                 (F, f, f77, F77, pfo)
Fortran 90                 (F90, f90)
Fortran 95                 (F95, f95)
Go                         (go)
Groovy                     (gant, groovy)
Haskell                    (hs, lhs)
HTML                       (htm, html)
IDL                        (idl, pro)
InstallShield              (ism)
Java                       (java)
Javascript                 (js)
JavaServer Faces           (jsf, xhtml)
JCL                        (jcl)
JSP                        (jsp)
Kermit                     (ksc)
Korn Shell                 (ksh)
LESS                       (less)
lex                        (l)
Lisp                       (el, jl, lisp, lsp, sc, scm)
Lisp/OpenCL                (cl)
LiveLink OScript           (oscript)
Lua                        (lua)
m4                         (ac, m4)
make                       (am, gnumakefile, Gnumakefile, Makefile, makefile)
MATLAB                     (m)
Maven                      (pom, pom.xml)
Modula3                    (i3, ig, m3, mg)
MSBuild scripts            (csproj, wdproj)
MUMPS                      (mps, m)
MXML                       (mxml)
NAnt scripts               (build)
NASTRAN DMAP               (dmap)
Objective C                (m)
Objective C++              (mm)
OCaml                      (ml, mli, mll, mly)
Oracle Forms               (fmt)
Oracle Reports             (rex)
Pascal                     (dpr, p, pas, pp)
Patran Command Language    (pcl, ses)
Perl                       (perl, PL, pl, plh, plx, pm)
PHP                        (php, php3, php4, php5)
PHP/Pascal                 (inc)
Pig Latin                  (pig)
PowerShell                 (ps1)
Python                     (py)
QML                        (qml)
Razor                      (cshtml)
Rexx                       (rexx)
Ruby                       (rb)
Ruby HTML                  (rhtml)
Rust                       (rs)
SASS                       (sass, scss)
Scala                      (scala)
sed                        (sed)
SKILL                      (il)
SKILL++                    (ils)
Smarty                     (smarty, tpl)
Softbridge Basic           (sbl, SBL)
SQL                        (psql, SQL, sql)
SQL Data                   (data.sql)
SQL Stored Procedure       (spc.sql, spoc.sql, sproc.sql, udf.sql)
Tcl/Tk                     (itk, tcl, tk)
Teamcenter def             (def)
Teamcenter met             (met)
Teamcenter mth             (mth)
Vala                       (vala)
Vala Header                (vapi)
Verilog-SystemVerilog      (sv, svh, v)
VHDL                       (vhd, VHD, VHDL, vhdl)
vim script                 (vim)
Visual Basic               (bas, cls, ctl, dsr, frm, vb, VB, vba, VBA, vbs, VBS)
Visualforce Component      (component)
Visualforce Page           (page)
XAML                       (xaml)
XML                        (XML, xml)
XSD                        (xsd, XSD)
XSLT                       (xsl, XSL, xslt, XSLT)
yacc                       (y)
YAML                       (yaml, yml)

The above list can be customized by reading language definitions from a file with the --read-lang-def or --force-lang-def options.

Three file extensions have multiple language mappings:

cloc has subroutines that attempt to identify the correct language based on the file's contents for these special cases. Language identification accuracy is a function of how much code the file contains; .m files with just one or two lines for example, seldom have enough information to correctly distinguish between MATLAB, MUMPS, or Objective C.

How it Works ^

cloc's method of operation resembles SLOCCount's: First, create a list of files to consider. Next, attempt to determine whether or not found files contain recognized computer language source code. Finally, for files identified as source files, invoke language-specific routines to count the number of source lines.

A more detailed description:

  1. If the input file is an archive (such as a .tar.gz or .zip file), create a temporary directory and expand the archive there using a system call to an appropriate underlying utility (tar, bzip2, unzip, etc) then add this temporary directory as one of the inputs. (This works more reliably on Unix than on Windows.)
  2. Use File::Find to recursively descend the input directories and make a list of candidate file names. Ignore binary and zero-sized files.
  3. Make sure the files in the candidate list have unique contents (first by comparing file sizes, then for, similarly sized files, compare MD5 hashes of the file contents with Digest::MD5).
  4. Scan the candidate file list for file extensions which cloc associates with programming languages (see the --show-lang and --show-ext options). Files which match are classified as containing source code for that language. Each file without an extensions is opened and its first line read to see if it is a Unix shell script (anything that begins with #!). If it is shell script, the file is classified by that scripting language (if the language is recognized). If the file does not have a recognized extension or is not a recognzied scripting language, the file is ignored.
  5. All remaining files in the candidate list should now be source files for known programming languages. For each of these files:
    1. Read the entire file into memory.
    2. Count the number of lines (= Loriginal).
    3. Remove blank lines, then count again (= Lnon_blank).
    4. Loop over the comment filters defined for this language. (For example, C++ has two filters: (1) remove lines that start with optional whitespace followed by // and (2) remove text between /* and */) Apply each filter to the code to remove comments. Count the left over lines (= Lcode).
    5. Save the counts for this language:
      blank lines = Loriginal - Lnon_blank
      comment lines = Loriginal - Lnon_blank - Lcode
      code lines = Lcode

The options modify the algorithm slightly. The --read-lang-def option for example allows the user to read definitions of comment filters, known file extensions, and known scripting languages from a file. The code for this option is processed between Steps 2 and 3.

Advanced Use ^

Remove Comments from Source Code^

How can you tell if cloc correctly identifies comments? One way to convince yourself cloc is doing the right thing is to use its --strip-comments option to remove comments and blank lines from files, then compare the stripped-down files to originals.

Let's try this out with the SQLite amalgamation, a C file containing all code needed to build the SQLite library along with a header file:
prompt> tar zxf sqlite-amalgamation-3.5.6.tar.gz 
prompt> cd sqlite-3.5.6/
prompt> cloc --strip-comments=nc sqlite.c
       1 text file.
       1 unique file.                              
Wrote sqlite3.c.nc
       0 files ignored.

http://cloc.sourceforge.net v 1.03  T=1.0 s (1.0 files/s, 82895.0 lines/s)
-------------------------------------------------------------------------------
Language          files     blank   comment      code    scale   3rd gen. equiv
-------------------------------------------------------------------------------
C                     1      5167     26827     50901 x   0.77 =       39193.77
-------------------------------------------------------------------------------

The extention argument given to --strip-comments is arbitrary; here nc was used as an abbreviation for "no comments".

cloc removed over 31,000 lines from the file:

prompt> wc -l sqlite3.c sqlite3.c.nc 
  82895 sqlite3.c
  50901 sqlite3.c.nc
 133796 total
prompt> echo "82895 - 50901" | bc
31994

We can now compare the orignial file, sqlite3.c and the one stripped of comments, sqlite3.c.nc with tools like diff or vimdiff and see what exactly cloc considered comments and blank lines. A rigorous proof that the stripped-down file contains the same C code as the original is to compile these files and compare checksums of the resulting object files.

First, the original source file:

prompt> gcc -c sqlite3.c
prompt> md5sum sqlite3.o
cce5f1a2ea27c7e44b2e1047e2588b49  sqlite3.o

Next, the version without comments:

prompt> mv sqlite3.c.nc sqlite3.c
prompt> gcc -c sqlite3.c
prompt> md5sum sqlite3.o
cce5f1a2ea27c7e44b2e1047e2588b49  sqlite3.o
cloc removed over 31,000 lines of comments and blanks but did not modify the source code in any significant way since the resulting object file matches the original.

Work with Compressed Archives ^

Versions of cloc before v1.07 required an --extract-with=<cmd> option to tell cloc how to expand an archive file. Beginning with v1.07 this is extraction is attempted automatically. At the moment the automatic extraction method works reasonably well on Unix-type OS's for the following file types: .tar.gz, .tar.bz2, .tgz, .zip, .ear. Some of these extensions work on Windows if one has WinZip installed in the default location (C:\Program Files\WinZip\WinZip32.exe). Additionally, with newer versions of WinZip, the command line add-on is needed for correct operation; in this case one would invoke cloc with something like
--extract-with="\"c:\Program Files\WinZip\wzunzip\" -e -o >FILE< ." (ref. forum post).

In situations where the automatic extraction fails, one can try the --extract-with=<cmd> option to count lines of code within tar files, Zip files, or other compressed archives for which one has an extraction tool. cloc takes the user-provided extraction command and expands the archive to a temporary directory (created with File::Temp), counts the lines of code in the temporary directory, then removes that directory. While not especially helpful when dealing with a single compressed archive (after all, if you're going to type the extraction command anyway why not just manually expand the archive?) this option is handy for working with several archives at once.

For example, say you have the following source tarballs on a Unix machine
     perl-5.8.5.tar.gz
     Python-2.4.2.tar.gz
and you want to count all the code within them. The command would be

cloc --extract-with='gzip -dc >FILE< | tar xf -' perl-5.8.5.tar.gz Python-2.4.2.tar.gz
If that Unix machine has GNU tar (which can uncompress and extract in one step) the command can be shortened to
cloc --extract-with='tar zxf >FILE<' perl-5.8.5.tar.gz Python-2.4.2.tar.gz
On a Windows computer with WinZip installed in c:\Program Files\WinZip the command would look like
cloc.exe --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ." perl-5.8.5.tar.gz Python-2.4.2.tar.gz
Java .ear files are Zip files that contain additional Zip files. cloc can handle nested compressed archives without difficulty--provided all such files are compressed and archived in the same way. Examples of counting a Java .ear file in Unix and Windows:
Unix> cloc --extract-with="unzip -d . >FILE< " Project.ear

DOS> cloc.exe --extract-with="\"c:\Program Files\WinZip\WinZip32.exe\" -e -o >FILE< ." Project.ear

Differences^

The --diff switch allows one to measure the relative change in source code and comments between two versions of a file, directory, or archive. Differences reveal much more than absolute code counts of two file versions. For example, say a source file has 100 lines and its developer delivers a newer version with 102 lines. Did he add two comment lines, or delete seventeen source lines and add fourteen source lines and five comment lines, or did he do a complete rewrite, discarding all 100 original lines and adding 102 lines of all new source? The diff option tells how many lines of source were added, removed, modified or stayed the same, and how many lines of comments were added, removed, modified or stayed the same.

In addition to file pairs, one can give cloc pairs of directories, or pairs of file archives, or a file archive and a directory. cloc will try to align file pairs within the directories or archives and compare diffs for each pair. For example, to see what changed between GCC 4.4.0 and 4.5.0 one could do

  cloc --diff gcc-4.4.0.tar.bz2  gcc-4.5.0.tar.bz2
Be prepared to wait a while for the results though; the --diff option runs much more slowly than an absolute code count.

To see how cloc aligns files between the two archives, use the --diff-alignment option

  cloc --diff-aligment=align.txt gcc-4.4.0.tar.bz2  gcc-4.5.0.tar.bz2
to produce the file align.txt which shows the file pairs as well as files added and deleted. The symbols == and != before each file pair indicate if the files are identical (==) or if they have different content (!=).

Here's sample output showing the difference between the Python 2.6.6 and 2.7 releases:

prompt> cloc --diff  Python-2.6.6.tar.bz2 Python-2.7.tar.bz2
      3870 text files.
      4130 text files.s
      2177 files ignored.                                         
  
  2 errors:
  Diff error (quoted comments?):  /tmp/4QAqkrHN7Z/Python-2.6.6/Mac/Modules/qd/qdsupport.py
  Diff error (quoted comments?):  /tmp/LvStB1lQxd/Python-2.7/Mac/Modules/qd/qdsupport.py
  
  http://cloc.sourceforge.net v 1.52  T=422.0 s (0.0 files/s, 0.0 lines/s)
  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  vim script
   same                            0              0              7             85
   modified                        1              0              0             20
   added                           0              0              0              1
   removed                         0              0              0              0
  Expect
   same                            1              0              0             60
   modified                        0              0              0              0
   added                           6              0              0              0
   removed                         0              0              0              0
  CSS
   same                            1              0             19            318
   modified                        0              0              0              0
   added                           0              0              0              0
   removed                         0              0              0              0
  XML
   same                            1              0              0              4
   modified                        0              0              0              0
   added                           3              0              0              0
   removed                         1              0              0              0
  m4
   same                            1              0             19           1089
   modified                        2              0              0            130
   added                           5              6              5            150
   removed                         0            660             15           5905
  Visual Basic
   same                            2              0              1             12
   modified                        0              0              0              0
   added                           0              0              0              0
   removed                         0              0              0              0
  Lisp
   same                            1              0            503           2933
   modified                        0              0              0              0
   added                           0              0              0              0
   removed                         0              0              0              0
  NAnt scripts
   same                            2              0              0             30
   modified                        0              0              0              0
   added                           0              0              0              0
   removed                         0              0              0              0
  HTML
   same                           12              0             11           2329
   modified                        2              0              0              2
   added                           0              0              0              0
   removed                         9              0              0              0
  make
   same                            3              0            353           2888
   modified                        7              0              3             11
   added                           2              1              0             14
   removed                         0              2              0              8
  Objective C
   same                            6              0             70            633
   modified                        1              0              0              2
   added                           0              0              0              0
   removed                         0              0              0              0
  Assembly
   same                           22              0           1575           9156
   modified                       14              0             78            174
   added                           3            171            111            998
   removed                         2              1              0            189
  Bourne Shell
   same                           26              0           2828          20114
   modified                        7              0            255           2179
   added                           5            163           1103           4770
   removed                         0            550           2444          11660
  (unknown)
   same                            0              0              0              0
   modified                        0              0              0              0
   added                          32              0              0              0
   removed                        26              0              0              0
  C++
   same                            0              0              0              0
   modified                        0              0              0              0
   added                           2              0              0              0
   removed                         0              0              0              0
  Teamcenter def
   same                            6              0            158            885
   modified                        2              0              0              0
   added                           1              2              4             17
   removed                         1              0              4              2
  DOS Batch
   same                           26              0            101            416
   modified                        5              0              1              8
   added                           1              0              0              0
   removed                         0              0              0              0
  C/C++ Header
   same                          143              0           9016          37452
   modified                       90              0            157          15564
   added                          12            181            341          10247
   removed                         1            101            129           5219
  C
   same                          222              0          28753         322642
   modified                      157              0            542           5023
   added                         141           1485           1730          12440
   removed                         4            223            619           4519
  Python
   same                         1211              0          92289         348923
   modified                      740              0           1238          11589
   added                         114           2845           4645          17251
   removed                        23           1409           2617           6385
  -------------------------------------------------------------------------------
  SUM:
   same                         1686              0         135703         749969
   modified                     1028              0           2274          34702
   added                         327           4854           7939          45888
   removed                        67           2946           5828          33887
  -------------------------------------------------------------------------------
Note the two errors for the file Python-X/Mac/Modules/qd/qdsupport.py. This file has Python docstrings (text between pairs of triple quotes) that contain C comments. cloc treats docstrings as comments and handles them by first converting them to C comments, then using the C comment removing regular expression. Nested C comments yield erroneous results however.

There's also output for the language "(unknown)". Files in this category are non-source files and therefore not counted; their presence is merely noted as having been removed, added, or modified.

Create Custom Language Definitions ^

cloc can write its language comment definitions to a file or can read comment definitions from a file, overriding the built-in definitions. This can be useful when you want to use cloc to count lines of a language not yet included, to change association of file extensions to languages, or to modify the way existing languages are counted.

The easiest way to create a custom language definition file is to make cloc write its definitions to a file, then modify that file:

Unix> cloc --write-lang-def=my_definitions.txt
creates the file my_definitions.txt which can be modified then read back in with either the --read-lang-def or --force-lang-def option. The difference between the options is former merges language definitions from the given file in with cloc's internal definitions with cloc'taking precedence if there are overlaps. The --force-lang-def option, on the other hand, replaces cloc's definitions completely. This option has a disadvantage in preventing cloc from counting languages whose extensions map to multiple languages as these languages require additional logic that is not easily expressed in a definitions file.
Unix> cloc --read-lang-def=my_definitions.txt  file1 file2 dir1 ...

Each language entry has four parts:

  1. The language name starting in column 1.
  2. One or more comment filters starting in column 5.
  3. One or more filename extensions starting in column 5.
  4. A 3rd generation scale factor starting in column 5. This entry must be provided but its value is not important unless you want to compare your language to a hypothetical third generation programming language.
A filter defines a method to remove comment text from the source file. For example the entry for C++ looks like this
C++
    filter remove_matches ^\s*//
    filter call_regexp_common C
    extension C
    extension cc
    extension cpp
    extension cxx
    extension pcc
    3rd_gen_scale 1.51
C++ has two filters: first, remove lines that start with optional whitespace and are followed by //. Next, remove all C comments. C comments are difficult to express as regular expressions so a call is made to Regexp::Common to get the appropriate regular expression to match C comments which are then removed.

A more complete discussion of the different filter options may appear here in the future. The output of cloc's --write-lang-def option should provide enough examples for motivated individuals to modify or extend cloc's language definitions.

Combine Reports ^

If you manage multiple software projects you might be interested in seeing line counts by project, not just by language. Say you manage three software projects called MySQL, PostgreSQL, and SQLite. The teams responsible for each of these projects run cloc on their source code and provide you with the output. For example MySQL team does

cloc --report-file=mysql-5.1.42.txt mysql-5.1.42.tar.gz
and provides you with the file mysql-5.1.42.txt. The contents of the three files you get are
Unix> cat mysql-5.1.42.txt
http://cloc.sourceforge.net v 1.50  T=26.0 s (108.1 files/s, 65774.5 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C++                             615          93609         110909         521041
C                               642          83179          82424         393602
C/C++ Header                   1065          33980          77633         142779
Bourne Shell                    178          14892          11437          74525
Perl                             60           7634           4667          22703
m4                               13           1220            394          10497
make                            119            914           1855           4447
XML                              27            564             23           4107
SQL                              18            517            209           3433
Assembly                         12            161              0           1304
yacc                              2            167             40           1048
lex                               2            332            113            879
Teamcenter def                   43             85            219            701
Javascript                        3             70            140            427
Pascal                            2              0            436            377
HTML                              1              7              0            250
Bourne Again Shell                1              6              1             48
DOS Batch                         8             23             73             36
--------------------------------------------------------------------------------
SUM:                           2811         237360         290573        1182204
--------------------------------------------------------------------------------
Unix> cat sqlite-3.6.22.txt
http://cloc.sourceforge.net v 1.50  T=3.0 s (4.7 files/s, 53833.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                                2           7459          37993          68944
Bourne Shell                     7           3344           4522          25849
m4                               2            754             20           6557
C/C++ Header                     2            155           4808           1077
make                             1              6              0             13
-------------------------------------------------------------------------------
SUM:                            14          11718          47343         102440
-------------------------------------------------------------------------------

Unix> cat postgresql-8.4.2.txt
http://cloc.sourceforge.net v 1.50  T=16.0 s (129.1 files/s, 64474.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                              923         102324         167390         563865
C/C++ Header                   556           9180          22723          40990
Bourne Shell                    51           3692           3245          28486
SQL                            260           8246           5645          25862
yacc                             6           2667           2126          22825
Perl                            36            782            696           4894
lex                              8            708           1525           3638
make                           180           1215           1385           3453
m4                              12            199             25           1431
Teamcenter def                  13              4              0           1104
HTML                             2             94              1            410
DOS Batch                        7             53             22            188
XSLT                             5             41             30            111
Assembly                         3             17              0            105
D                                1             14             14             65
CSS                              1             16              7             44
sed                              1              1              7             15
Python                           1              5              1             12
-------------------------------------------------------------------------------
SUM:                          2066         129258         204842         697498
-------------------------------------------------------------------------------

While these three files are interesting, you also want to see the combined counts from all projects. That can be done with cloc's --sum_reports option:

Unix> cloc --sum-reports --report_file=databases mysql-5.1.42.txt  postgresql-8.4.2.txt  sqlite-3.6.22.txt
Wrote databases.lang
Wrote databases.file
The report combination produces two output files, one for sums by programming language (databases.lang) and one by project (databases.file). Their contents are
Unix> cat databases.lang
http://cloc.sourceforge.net v 1.50
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                              1567         192962         287807        1026411
C++                             615          93609         110909         521041
C/C++ Header                   1623          43315         105164         184846
Bourne Shell                    236          21928          19204         128860
SQL                             278           8763           5854          29295
Perl                             96           8416           5363          27597
yacc                              8           2834           2166          23873
m4                               27           2173            439          18485
make                            300           2135           3240           7913
lex                              10           1040           1638           4517
XML                              27            564             23           4107
Teamcenter def                   56             89            219           1805
Assembly                         15            178              0           1409
HTML                              3            101              1            660
Javascript                        3             70            140            427
Pascal                            2              0            436            377
DOS Batch                        15             76             95            224
XSLT                              5             41             30            111
D                                 1             14             14             65
Bourne Again Shell                1              6              1             48
CSS                               1             16              7             44
sed                               1              1              7             15
Python                            1              5              1             12
--------------------------------------------------------------------------------
SUM:                           4891         378336         542758        1982142
--------------------------------------------------------------------------------

Unix> cat databases.file
----------------------------------------------------------------------------------
Report File                     files          blank        comment           code
----------------------------------------------------------------------------------
mysql-5.1.42.txt                 2811         237360         290573        1182204
postgresql-8.4.2.txt             2066         129258         204842         697498
sqlite-3.6.22.txt                  14          11718          47343         102440
----------------------------------------------------------------------------------
SUM:                             4891         378336         542758        1982142
----------------------------------------------------------------------------------

Report files themselves can be summed together. Say you also manage development of Perl and Python and you want to keep track of those line counts separately from your database projects. First create reports for Perl and Python separately:

cloc --report-file=perl-5.10.0.txt perl-5.10.0.tar.gz
cloc --report-file=python-2.6.4.txt Python-2.6.4.tar.bz2
then sum these together with
Unix> cloc --sum-reports --report_file=script_lang perl-5.10.0.txt python-2.6.4.txt
Wrote script_lang.lang
Wrote script_lang.file

Unix> cat script_lang.lang
http://cloc.sourceforge.net v 1.50
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                              518          61871          52705         473034
Python                        1965          76022          95289         365716
Perl                          2052         110356         130018         292281
C/C++ Header                   381          13762          21402         102276
Bourne Shell                   149           9376          11665          81508
Lisp                             2           1154           2745          10448
Assembly                        38           1616           1712           9755
m4                               3            825             34           7124
make                            16            954            804           4829
HTML                            25            516             13           3010
Teamcenter def                   9            170            162           2075
XML                             28            288              0           2034
C++                             10            312            277           2000
yacc                             2            128             97           1549
DOS Batch                       42            175            152            746
Objective C                      7            102             70            635
YAML                             2              2              0            489
CSS                              1             94             19            308
vim script                       1             36              7            105
Expect                           1              0              0             60
NAnt scripts                     2              1              0             30
Visual Basic                     2              1              1             12
-------------------------------------------------------------------------------
SUM:                          5256         277761         317172        1360024
-------------------------------------------------------------------------------

Unix> cat script_lang.file
-------------------------------------------------------------------------------
Report File                  files          blank        comment           code
-------------------------------------------------------------------------------
python-2.6.4.txt              2746         135676         143269         830347
perl-5.10.0.txt               2510         142085         173903         529677
-------------------------------------------------------------------------------
SUM:                          5256         277761         317172        1360024
-------------------------------------------------------------------------------

Finally, combine the combination files:
Unix> cloc --sum-reports --report_file=everything databases.lang script_lang.lang
Wrote everything.lang
Wrote everything.file

Unix> cat everything.lang
http://cloc.sourceforge.net v 1.50
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C                              2085         254833         340512        1499445
C++                             625          93921         111186         523041
Python                         1966          76027          95290         365728
Perl                           2148         118772         135381         319878
C/C++ Header                   2004          57077         126566         287122
Bourne Shell                    385          31304          30869         210368
SQL                             278           8763           5854          29295
m4                               30           2998            473          25609
yacc                             10           2962           2263          25422
make                            316           3089           4044          12742
Assembly                         53           1794           1712          11164
Lisp                              2           1154           2745          10448
XML                              55            852             23           6141
lex                              10           1040           1638           4517
Teamcenter def                   65            259            381           3880
HTML                             28            617             14           3670
DOS Batch                        57            251            247            970
Objective C                       7            102             70            635
YAML                              2              2              0            489
Javascript                        3             70            140            427
Pascal                            2              0            436            377
CSS                               2            110             26            352
XSLT                              5             41             30            111
vim script                        1             36              7            105
D                                 1             14             14             65
Expect                            1              0              0             60
Bourne Again Shell                1              6              1             48
NAnt scripts                      2              1              0             30
sed                               1              1              7             15
Visual Basic                      2              1              1             12
--------------------------------------------------------------------------------
SUM:                          10147         656097         859930        3342166
--------------------------------------------------------------------------------

Unix> cat everything.file
-------------------------------------------------------------------------------
Report File                  files          blank        comment           code
-------------------------------------------------------------------------------
databases.lang                4891         378336         542758        1982142
script_lang.lang              5256         277761         317172        1360024
-------------------------------------------------------------------------------
SUM:                         10147         656097         859930        3342166
-------------------------------------------------------------------------------

SQL^

Cloc can write results in the form of SQL table create and insert statements for use with relational database programs such as SQLite, MySQL, PostgreSQL, Oracle, or Microsoft SQL. Once the code count information is in a database, the information can be interrogated and displayed in interesting ways.

A database created from cloc SQL output has two tables, metadata and t:

metadata

Field Type
  timestamp       text
  project   text
  elapsed_s   real
                       

t

Field Type
  project       text
  language   text
  file   text
  nBlank   integer
  nComment   integer
  nCode   integer
  nScaled   real

The metadata table contains information about when the cloc run was made. The --sql-append switch allows one to combine many runs in a single database; each run adds a row to the metadata table. The code count information resides in table t.

Let's repeat the code count examples of Perl, Python, SQLite, MySQL and PostgreSQL tarballs shown in the combine reports example above, this time using the SQL output options and the SQLite database engine.

The --sql switch tells cloc to generate output in the form of SQL table create and insert commands. The switch takes an argument of a file name to write these SQL statements into, or, if the argument is 1 (numeric one), streams output to STDOUT. Since the SQLite command line program, sqlite3, can read commands from STDIN, we can dispense with storing SQL statements to a file and use --sql 1 to pipe data directly into the SQLite executable:

cloc --sql 1 --sql-project mysql mysql-5.1.42.tar.gz    | sqlite3 code.db
The --sql-project mysql part is optional; there's no need to specify a project name when working with just one code base. However, since we'll be adding code counts from four other tarballs, we'll only be able to identify data by input source if we supply a project name for each run.

Now that we have a database we will need to pass in the --sql-append switch to tell cloc not to wipe out this database but instead add more data:

cloc --sql 1 --sql-project postgresql --sql-append postgresql-8.4.2.tar.bz2          | sqlite3 code.db
cloc --sql 1 --sql-project sqlite     --sql-append sqlite-amalgamation-3.6.22.tar.gz | sqlite3 code.db
cloc --sql 1 --sql-project python     --sql-append Python-2.6.4.tar.bz2              | sqlite3 code.db
cloc --sql 1 --sql-project perl       --sql-append perl-5.10.0.tar.gz                | sqlite3 code.db

Now the fun begins--we have a database, code.db, with lots of information about the five projects and can begin querying it for all manner of interesting facts.

Which is the longest file over all projects?

>  sqlite3 code.db 'select project,file,nBlank+nComment+nCode as nL from t where nL = (select max(nBlank+nComment+nCode) from t)'

sqlite|sqlite-3.6.22/sqlite3.c|110860

sqlite3's default output format leaves a bit to be desired. We can add an option to the program's rc file, ~/.sqliterc, to show column headers:
.header on
One might be tempted to also include
.mode column
in ~/.sqliterc but this causes problems when the output has more than one row since the widths of entries in the first row govern the maximum width for all subsequent rows. Often this leads to truncated output--not at all desireable. One option is to write a custom SQLite output formatter such as sqlite_formatter. It is used like so:
>  sqlite3 code.db 'select project,file,nBlank+nComment+nCode as nL from t where nL = (select max(nBlank+nComment+nCode) from t)' | sqlite_formatter

Project File                    nL     
_______ _______________________ ______ 
sqlite  sqlite-3.6.22/sqlite3.c 110860

Note also that sqlite3 has an HTML output option, --html, that might also prove useful.

Which is the longest file in each project?

> sqlite3 code.db 'select project,file,max(nBlank+nComment+nCode) as nL from t group by project order by nL;' | sqlite_formatter

Project    File                                          nL     
__________ _____________________________________________ ______ 
perl       perl-5.10.0/t/op/mkdir.t                       22658 
python     Python-2.6.4/Lib/email/quoprimime.py           28091 
postgresql postgresql-8.4.2/contrib/pgcrypto/pgp-pgsql.c  40041 
mysql      mysql-5.1.42/netware/mysqldump.def             51841 
sqlite     sqlite-3.6.22/config.sub                      110860 

Which files in each project have the most code lines?

> sqlite3 code.db 'select project,file,max(nCode) as nL from t group by project order by nL desc;' | sqlite_formatter

Project    File                                          nL    
__________ _____________________________________________ _____ 
sqlite     sqlite-3.6.22/config.sub                      66142 
mysql      mysql-5.1.42/netware/mysqldump.def            38555 
postgresql postgresql-8.4.2/contrib/pgcrypto/pgp-pgsql.c 36905 
python     Python-2.6.4/Lib/email/quoprimime.py          26705 
perl       perl-5.10.0/t/op/mkdir.t                      20079 

Which C source files with more than 300 lines have a comment ratio below 1%?

> sqlite3 code.db 'select project, language, file, nCode, nComment, (100.0*nComment)/(nComment+nCode) as comment_ratio from t 
   where language="C" and nCode > 300 and comment_ratio < 1 order by comment_ratio;' | sqlite_formatter

Project    Language File                                                                          nCode nComment comment_ratio      
__________ ________ _____________________________________________________________________________ _____ ________ __________________ 
mysql      C        mysql-5.1.42/scripts/mysql_fix_privilege_tables_sql.c                           658        0 0.0                
python     C        Python-2.6.4/Python/graminit.c                                                 2143        1 0.0466417910447761 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_turkish.c          2095        1 0.0477099236641221 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_french.c           1211        1 0.0825082508250825 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_french.c      1201        1 0.0831946755407654 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_hungarian.c        1182        1 0.084530853761623  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_hungarian.c   1178        1 0.0848176420695505 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_english.c          1072        1 0.0931966449207828 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_english.c     1064        1 0.0938967136150235 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_spanish.c          1053        1 0.094876660341556  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_spanish.c     1049        1 0.0952380952380952 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_italian.c          1031        1 0.0968992248062016 
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_italian.c     1023        1 0.09765625         
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_portuguese.c        981        1 0.10183299389002   
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_portuguese.c   975        1 0.102459016393443  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_romanian.c          967        1 0.103305785123967  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_2_romanian.c     961        1 0.103950103950104  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_finnish.c           720        1 0.13869625520111   
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_porter.c            717        1 0.139275766016713  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_finnish.c      714        1 0.13986013986014   
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_porter.c       711        1 0.140449438202247  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_KOI8_R_russian.c          660        1 0.151285930408472  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_russian.c           654        1 0.152671755725191  
python     C        Python-2.6.4/Mac/Modules/qt/_Qtmodule.c                                       26705       42 0.157026956294164  
python     C        Python-2.6.4/Mac/Modules/icn/_Icnmodule.c                                      1521        3 0.196850393700787  
mysql      C        mysql-5.1.42/strings/ctype-extra.c                                             8348       17 0.203227734608488  
python     C        Python-2.6.4/Python/Python-ast.c                                               5910       17 0.286823013328834  
python     C        Python-2.6.4/Mac/Modules/menu/_Menumodule.c                                    3263       10 0.305530094714329  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_dutch.c             596        2 0.334448160535117  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_dutch.c        586        2 0.340136054421769  
perl       C        perl-5.10.0/x2p/a2p.c                                                          2916       10 0.341763499658236  
python     C        Python-2.6.4/Mac/Modules/qd/_Qdmodule.c                                        6694       24 0.357249181303959  
python     C        Python-2.6.4/Mac/Modules/win/_Winmodule.c                                      3056       11 0.358656667753505  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_german.c            476        2 0.418410041841004  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_german.c       470        2 0.423728813559322  
perl       C        perl-5.10.0/x2p/walk.c                                                         2024       10 0.491642084562439  
python     C        Python-2.6.4/Mac/Modules/ctl/_Ctlmodule.c                                      5442       28 0.511882998171846  
python     C        Python-2.6.4/Mac/Modules/ae/_AEmodule.c                                        1347        7 0.51698670605613   
python     C        Python-2.6.4/Mac/Modules/app/_Appmodule.c                                      1712        9 0.52295177222545   
mysql      C        mysql-5.1.42/strings/ctype-euc_kr.c                                            8691       49 0.560640732265446  
mysql      C        mysql-5.1.42/storage/archive/archive_reader.c                                   348        2 0.571428571428571  
python     C        Python-2.6.4/Mac/Modules/evt/_Evtmodule.c                                       504        3 0.591715976331361  
python     C        Python-2.6.4/Modules/expat/xmlrole.c                                           1250        8 0.635930047694754  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_UTF_8_danish.c            312        2 0.636942675159236  
mysql      C        mysql-5.1.42/strings/ctype-gbk.c                                               9946       64 0.639360639360639  
postgresql C        postgresql-8.4.2/src/backend/snowball/libstemmer/stem_ISO_8859_1_danish.c       310        2 0.641025641025641  
mysql      C        mysql-5.1.42/strings/ctype-gb2312.c                                            5735       40 0.692640692640693  
python     C        Python-2.6.4/Mac/Modules/res/_Resmodule.c                                      1621       12 0.734843845682792  
python     C        Python-2.6.4/Mac/Modules/drag/_Dragmodule.c                                    1046        8 0.759013282732448  
postgresql C        postgresql-8.4.2/contrib/hstore/hstore_op.c                                     522        4 0.760456273764259  
python     C        Python-2.6.4/Mac/Modules/list/_Listmodule.c                                    1022        8 0.776699029126214  
python     C        Python-2.6.4/Mac/Modules/te/_TEmodule.c                                        1198       10 0.827814569536424  
python     C        Python-2.6.4/Mac/Modules/cg/_CGmodule.c                                        1190       10 0.833333333333333  
postgresql C        postgresql-8.4.2/contrib/hstore/hstore_io.c                                     451        4 0.879120879120879  
postgresql C        postgresql-8.4.2/src/interfaces/ecpg/preproc/preproc.c                        36905      330 0.886262924667651  
python     C        Python-2.6.4/Modules/clmodule.c                                                2379       23 0.957535387177352  
python     C        Python-2.6.4/Mac/Modules/folder/_Foldermodule.c                                 306        3 0.970873786407767  

What are the ten longest files (based on code lines) that have no comments at all? Exclude header and YAML files.

> sqlite3 code.db 'select project, file, nCode from t where nComment = 0 and language not in ("C/C++ Header", "YAML") order by nCode desc limit 10;' | sqlite_formatter

Project File                                                  nCode 
_______ _____________________________________________________ _____ 
python  Python-2.6.4/PC/os2emx/python26.def                    1188 
python  Python-2.6.4/Lib/test/cjkencodings_test.py             1019 
python  Python-2.6.4/Tools/msi/schema.py                        920 
python  Python-2.6.4/Lib/msilib/schema.py                       920 
perl    perl-5.10.0/symbian/config.sh                           810 
perl    perl-5.10.0/uconfig.sh                                  771 
python  Python-2.6.4/Tools/pybench/Lookups.py                   700 
mysql   mysql-5.1.42/scripts/mysql_fix_privilege_tables_sql.c   658 
python  Python-2.6.4/Tools/pybench/Numbers.py                   637 
python  Python-2.6.4/Tools/pybench/Arithmetic.py                596

What are the most popular languages (in terms of lines of code) in each project?

> sqlite3 code.db 'select project, language, sum(nCode) as SumCode from t group by project,language order by project,SumCode desc;' | sqlite_formatter

Project    Language           SumCode 
__________ __________________ _______ 
mysql      C++                 521041 
mysql      C                   393602 
mysql      C/C++ Header        142779 
mysql      Bourne Shell         74525 
mysql      Perl                 22703 
mysql      m4                   10497 
mysql      make                  4447 
mysql      XML                   4107 
mysql      SQL                   3433 
mysql      Assembly              1304 
mysql      yacc                  1048 
mysql      lex                    879 
mysql      Teamcenter def         701 
mysql      Javascript             427 
mysql      Pascal                 377 
mysql      HTML                   250 
mysql      Bourne Again Shell      48 
mysql      DOS Batch               36 
perl       Perl                292281 
perl       C                   140483 
perl       C/C++ Header         44042 
perl       Bourne Shell         36882 
perl       Lisp                  7515 
perl       make                  2044 
perl       C++                   2000 
perl       XML                   1972 
perl       yacc                  1549 
perl       YAML                   489 
perl       DOS Batch              322 
perl       HTML                    98 
postgresql C                   563865 
postgresql C/C++ Header         40990 
postgresql Bourne Shell         28486 
postgresql SQL                  25862 
postgresql yacc                 22825 
postgresql Perl                  4894 
postgresql lex                   3638 
postgresql make                  3453 
postgresql m4                    1431 
postgresql Teamcenter def        1104 
postgresql HTML                   410 
postgresql DOS Batch              188 
postgresql XSLT                   111 
postgresql Assembly               105 
postgresql D                       65 
postgresql CSS                     44 
postgresql sed                     15 
postgresql Python                  12 
python     Python              365716 
python     C                   332551 
python     C/C++ Header         58234 
python     Bourne Shell         44626 
python     Assembly              9755 
python     m4                    7124 
python     Lisp                  2933 
python     HTML                  2912 
python     make                  2785 
python     Teamcenter def        2075 
python     Objective C            635 
python     DOS Batch              424 
python     CSS                    308 
python     vim script             105 
python     XML                     62 
python     Expect                  60 
python     NAnt scripts            30 
python     Visual Basic            12 
sqlite     C                    68944 
sqlite     Bourne Shell         25849 
sqlite     m4                    6557 
sqlite     C/C++ Header          1077 
sqlite     make                    13 

Third Generation Language Scale Factors^

cloc versions before 1.50 by default computed, for the provided inputs, a rough estimate of how many lines of code would be needed to write the same code in a hypothetical third-generation computer language. To produce this output one must now use the --3 switch.

Scale factors were derived from the 2006 version of language gearing ratios listed at Mayes Consulting web site, http://softwareestimator.com/IndustryData2.htm, using this equation:

cloc scale factor for language X = 3rd generation default gearing ratio / language X gearing ratio

for example,

cloc 3rd generation scale factor for DOS Batch = 80 / 128 = 0.625

The biggest flaw with this approach is that gearing ratios are defined for logical lines of source code not physical lines (which cloc counts). The values in cloc's 'scale' and '3rd gen. equiv.' columns should be taken with a large grain of salt.

Limitations ^

Identifying comments within source code is trickier than one might expect. Many languages would need a complete parser to be counted correctly. cloc does not attempt to parse any of the languages it aims to count and therefore is an imperfect tool. The following are known problems:

  1. Lines containing both source code and comments are counted as lines of code.
  2. Comment markers within strings or here-documents are treated as actual comment markers and not string literals. For example the following lines of C code
    printf(" /* ");
    for (i = 0; i < 100; i++) {
        a += i;
    }
    printf(" */ ");
    
    appear to cloc as two lines of C code (the lines with black text) and three lines of comments (the lines which have only red text--lines with both black and red text are treated as code).
  3. Lua long comments are not recognized.

How to Request Support for Additional Languages^

If cloc does not recognize a language you are interested in counting, post the following information to a Feature Request at cloc's SourceForge page:

  1. File extensions associated with the language. If the language does not rely on file extensions and instead works with fixed file names or with #! style program invocations, explain what those are.
  2. A description of how comments are defined.
  3. Links to sample code.

Author ^

Al Danial

Acknowledgments ^

Wolfram Rösler provided most of the code examples in the test suite. These examples come from his Hello World Collection.

Ismet Kursunoglu found errors with the MUMPS counter and provided access to a computer with a large body of MUMPS code to test cloc.

Tod Huggins gave helpful suggestions for the Visual Basic filters.

Anton Demichev found a flaw with the JSP counter in cloc v0.76 and wrote the XML ouput generator for the --xml option.

Reuben Thomas pointed out that ISO C99 allows // as a comment marker, provided code for the --no3 and --stdin-name options, counting the m4 language, and suggested several user-interface enhancements.

Michael Bello provided code for the --opt-match-f, --opt-not-match-f, --opt-match-d, and --opt-not-match-d options.

Mahboob Hussain inspired the --original-dir and --skip-uniqueness options, found a bug in the duplicate file detection logic and improved the JSP filter.

Randy Sharo found and fixed an uninitialized variable bug for shell scripts having only one line.

Steven Baker found and fixed a problem with the YAML output generator.

Greg Toth provided code to improve blank line detection in COBOL.

Joel Oliveira provided code to let --exclude-list-file handle directory name exclusion.

Blazej Kroll provided code to produce an XSLT file, cloc-diff.xsl, when producing XML output for the --diff option.

Denis Silakov enhanced the code which generates cloc.xsl when using --by-file and --by-file-by-lang options, and provided an XSL file that works with --diff output.

Andy (awalshe@sf.net) provided code to fix several bugs: correct output of --counted so that only files that are used in the code count appear and that results are shown by language rather than file name; allow --diff output from multiple runs to be summed together with --sum-reports.

Jari Aalto created the initial version of cloc.1.pod and maintains the Debian package for cloc.

Mikkel Christiansen (mikkels@gmail.com) provided counter definitions for Clojure and ClojureScript.

Vera Djuraskovic from Webhostinggeeks.com provided the Serbo-Croatian translation.

Erik Gooven Arellano Casillas provided an update to the MXML counter to recognize Actionscript comments.

The development of cloc was partially funded by the Northrop Grumman Corporation.

Copyright ^

Copyright (c) 2006-2013, Northrop Grumman Corporation.

License ^

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.


Get cloc at SourceForge.net. Fast, secure and Free Open Source software downloads