Go to file
Peter Meerwald-Stadler b9a65a1269 compare_testfiles.sh: pass Python code via -c
pass Python code via -c instead of feeding to STDIN; on my system
(Ubuntu) the later just measures the time to feed the code, not the
execution (resulting in 0.00s runtime being reported)
2021-04-27 16:12:28 +02:00
build refactor 2015-12-12 13:05:29 -08:00
build_vs2013 formalize windows build 2015-12-13 01:39:35 -08:00
src formalize windows build 2015-12-13 01:39:35 -08:00
tests compare_testfiles.sh: pass Python code via -c 2021-04-27 16:12:28 +02:00
.gitignore apple fix, gitignore 2015-12-12 17:39:46 -08:00
CMakeLists.txt formalize windows build 2015-12-13 01:39:35 -08:00
LICENSE Initial commit 2015-12-11 22:43:21 -08:00
README.md Update README.md 2018-04-25 15:54:28 -07:00

turbo-linecount

turbo-linecount 1.0 Copyright 2015, Christien Rioux

Super-Fast Multi-Threaded Line Counter

turbo-linecount is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.

Many times, you have to count the number of lines in text file on disk. The typical solution is to use wc -l on the command line. wc -l uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to turbo-linecount however. This may change in a future release.

How much faster is turbo-linecount? About 8 times faster than wc -l and 5 times faster than the naive Python implementation.

To use turbo-linecount, just run the command line:

tlc <file>

where <file> is the path to the file of which you'd like to count the lines.

###Help To get help with turbo-linecount:

tlc -h
usage: tlc [options] <file>
    -h  --help                         print this usage and exit
    -b  --buffersize <BUFFERSIZE>      size of buffer per-thread to use when reading (default is 1MB)
    -t  --threadcount <THREADCOUNT>    number of threads to use (defaults to number of cpu cores)
    -v  --version                      print version information and exit

###Building

To build turbo-linecount, we use cmake. Cmake 3.0.0 or higher is the preferred version as of this release. For simplified building on Windows, a Visual Studio 2013 solution file is also included.

To build with cmake:

cd build
cmake ..
make
make install

This will build and install the command line utility tlc, a shared library libturbo_linecount, a static library libturbo_linecount_static, and a header file turbo_linecount.h.

Building turbo-linecount is known to be possible on

Windows 32/64 bit
Mac OS X
Linux
Cygwin

Testing

Testing cmake against wc -l and python can be done with the test scripts. To generate some random test files, run create_testfiles.sh, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.

To run the test, run compare_testfiles.sh. This will generate output as such:

Timing for tlc
tlc: test_10MB.txt 0.006s
tlc: test_100MB.txt 0.015s
tlc: test_1GB.txt 0.127s
tlc: test_10GB.txt 1.196s
Timing for python
python: test_10MB.txt 0.025s
python: test_100MB.txt 0.084s
python: test_1GB.txt 0.661s
python: test_10GB.txt 6.165s
Timing for wc
wc: test_10MB.txt 0.012s
wc: test_100MB.txt 0.100s
wc: test_1GB.txt 0.933s
wc: test_10GB.txt 9.857s

Performance

Performance on Windows and Mac OS X is excellent for all file sizes. Performance on Linux and other operating systems is good, but can be better. Stay tuned.

  • Macbook Pro (Retina, 15-inch Mid 2014)
  • 2.8 GHz Intel Core i7
  • 1TB SSD hard drive
  • 16GB Memory
| File Size | `tlc`  | `python`       | `wc -l`        |
|-----------|--------|----------------|----------------|
| 10MB      | 0.006s | 0.025s (4.2x)  | 0.012s (2.0x)  |
| 100MB     | 0.015s | 0.084s (5.6x)  |  0.100s (6.7x) | 
| 1GB       | 0.127s | 0.661s (5.2x)  | 0.933s (7.3x)  |
| 10GB      | 1.196s | 6.165s (5.15x) | 9.857s (8.2x)  |