You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

88 lines
2.9 KiB

9 years ago
# turbo-linecount
turbo-linecount 1.0 Copyright 2015, Christien Rioux
9 years ago
### Super-Fast Multi-Threaded Line Counter
9 years ago
*turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
9 years ago
9 years ago
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to
9 years ago
9 years ago
How much faster is *turbo-linecount*? About 8 times faster than `wc` and 5 times faster than the naive Python implementation.
9 years ago
9 years ago
To use *turbo-linecount*, just run the command line:
9 years ago
```
lc <file>
```
9 years ago
9 years ago
where *\<file\>* is the path to the file of which you'd like to count the lines.
###Help
9 years ago
To get help with *turbo-linecount*:
9 years ago
```
lc -h
usage: lc [options] <file>
-h --help print this usage and exit
-b --buffersize <BUFFERSIZE> size of buffer per-thread to use when reading (default is 1MB)
-t --threadcount <THREADCOUNT> number of threads to use (defaults to number of cpu cores)
-v --version print version information and exit
```
###Building
9 years ago
To build *turbo-linecount*, we use *cmake*. Cmake 3.0.0 or higher is the preferred version as of this release. For simplified building on Windows, a Visual Studio 2013 solution file is also included.
9 years ago
9 years ago
To build with *cmake*:
9 years ago
```
cd build
cmake ..
make
make install
```
9 years ago
This will build and install the command line utility `tlc`, a shared library `libturbo_linecount`, a static library `libturbo_linecount_static`, and a header file `turbo_linecount.h`.
9 years ago
9 years ago
Building *turbo-linecount* is known to be possible on
9 years ago
```
Windows 32/64 bit
Mac OS X
Linux
9 years ago
Cygwin
9 years ago
```
9 years ago
### Testing
9 years ago
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
To run the test, run `compare_testfiles.sh`. This will generate output as such:
9 years ago
### Performance
Performance on Windows and Mac OS X is excellent for all file sizes. Performance on Linux and other operating systems is good, but can be better. Stay tuned.
9 years ago
```
9 years ago
Timing for tlc
9 years ago
lc: test_10MB.txt 0.006s
lc: test_100MB.txt 0.015s
lc: test_1GB.txt 0.127s
lc: test_10GB.txt 1.196s
Timing for python
python: test_10MB.txt 0.025s
python: test_100MB.txt 0.084s
python: test_1GB.txt 0.661s
python: test_10GB.txt 6.165s
Timing for wc
wc: test_10MB.txt 0.012s
wc: test_100MB.txt 0.100s
wc: test_1GB.txt 0.933s
wc: test_10GB.txt 9.857s
9 years ago
```
| | | | | |
|---|---|---|---|---|
| | | | | |
| | | | | |
| | | | | |