This commit is contained in:
Christien Rioux 2015-12-12 23:34:12 -08:00
parent 453d2bf46e
commit bf55e8383f
2 changed files with 33 additions and 17 deletions

View File

@ -1,23 +1,24 @@
# linecount # turbo-linecount
linecount 1.0 Copyright 2015, Christien Rioux turbo-linecount 1.0 Copyright 2015, Christien Rioux
### Super-Fast Multi-Threaded Line Counter ### Super-Fast Multi-Threaded Line Counter
*linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings. *turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to
How much faster is *linecount*? About 10 times faster than `wc` and 5 times faster than the naive Python implementation. How much faster is *turbo-linecount*? About 8 times faster than `wc` and 5 times faster than the naive Python implementation.
To use *linecount*, just run the command line: To use *turbo-linecount*, just run the command line:
``` ```
lc <file> lc <file>
``` ```
where *\<file\>* is the path to the file of which you'd like to count the lines. where *\<file\>* is the path to the file of which you'd like to count the lines.
###Help ###Help
To get help with *linecount*: To get help with *turbo-linecount*:
``` ```
lc -h lc -h
@ -30,8 +31,9 @@ usage: lc [options] <file>
###Building ###Building
To build *linecount*, we use *cmake*. Cmake 3.3.0 is the preferred version as of this release. For building just the command line utility on Windows, a Visual Studio 2013 solution file is also included. To build *turbo-linecount*, we use *cmake*. Cmake 3.0.0 or higher is the preferred version as of this release. For simplified building on Windows, a Visual Studio 2013 solution file is also included.
To build with *cmake*:
``` ```
cd build cd build
cmake .. cmake ..
@ -39,24 +41,30 @@ make
make install make install
``` ```
This will build and install the command line utility `lc`, a shared library `liblinecount`, a static library `liblinecount_static`, and a header file `linecount.h`. This will build and install the command line utility `tlc`, a shared library `libturbo_linecount`, a static library `libturbo_linecount_static`, and a header file `turbo_linecount.h`.
Building *linecount* is known to be possible on Building *turbo-linecount* is known to be possible on
``` ```
Windows 32/64 bit Windows 32/64 bit
Mac OS X Mac OS X
Linux Linux
Cygwin
``` ```
###Testing ### Testing
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space. Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
To run the test, run `compare_testfiles.sh`. This will generate output as such: To run the test, run `compare_testfiles.sh`. This will generate output as such:
### Performance
Performance on Windows and Mac OS X is excellent for all file sizes. Performance on Linux and other operating systems is good, but can be better. Stay tuned.
``` ```
Timing for lc Timing for tlc
lc: test_10MB.txt 0.006s lc: test_10MB.txt 0.006s
lc: test_100MB.txt 0.015s lc: test_100MB.txt 0.015s
lc: test_1GB.txt 0.127s lc: test_1GB.txt 0.127s
@ -71,4 +79,10 @@ wc: test_10MB.txt 0.012s
wc: test_100MB.txt 0.100s wc: test_100MB.txt 0.100s
wc: test_1GB.txt 0.933s wc: test_1GB.txt 0.933s
wc: test_10GB.txt 9.857s wc: test_10GB.txt 9.857s
``` ```
| | | | | |
|---|---|---|---|---|
| | | | | |
| | | | | |
| | | | | |

View File

@ -1,12 +1,14 @@
#!/bin/sh #!/bin/sh
if [ "$1" = "" ]; then
echo "specify path to tlc binary"
exit 1
else
TLC=$1
fi
tlctest() tlctest()
{ {
TLC=tlc
if [ -f ./tlc ]; then
TLC=./tlc
fi
OUT=`(time $TLC $1) 2>&1 | grep real | cut -f 2 | cut -c 3-` OUT=`(time $TLC $1) 2>&1 | grep real | cut -f 2 | cut -c 3-`
echo "tlc: $1 $OUT" echo "tlc: $1 $OUT"