Compare commits

...

5 Commits
1.0 ... master

Author SHA1 Message Date
Christien Rioux
9bf2210d40
Update README.md 2018-04-25 15:54:28 -07:00
Christien Rioux
4ca4808a26 finish sentence 2015-12-13 09:36:06 -08:00
Christien Rioux
edea23afe6 typo 2015-12-13 02:55:10 -08:00
Christien Rioux
7a2a4a601c update readme 2015-12-13 02:11:39 -08:00
Christien Rioux
82062d4e63 update readme 2015-12-13 02:10:47 -08:00

View File

@ -5,14 +5,14 @@ turbo-linecount 1.0 Copyright 2015, Christien Rioux
*turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings. *turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to Many times, you have to count the number of lines in text file on disk. The typical solution is to use `wc -l` on the command line. `wc -l` uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to *turbo-linecount* however. This may change in a future release.
How much faster is *turbo-linecount*? About 8 times faster than `wc` and 5 times faster than the naive Python implementation. How much faster is *turbo-linecount*? About 8 times faster than `wc -l` and 5 times faster than the naive Python implementation.
To use *turbo-linecount*, just run the command line: To use *turbo-linecount*, just run the command line:
``` ```
lc <file> tlc <file>
``` ```
where *\<file\>* is the path to the file of which you'd like to count the lines. where *\<file\>* is the path to the file of which you'd like to count the lines.
@ -21,8 +21,8 @@ where *\<file\>* is the path to the file of which you'd like to count the lines.
To get help with *turbo-linecount*: To get help with *turbo-linecount*:
``` ```
lc -h tlc -h
usage: lc [options] <file> usage: tlc [options] <file>
-h --help print this usage and exit -h --help print this usage and exit
-b --buffersize <BUFFERSIZE> size of buffer per-thread to use when reading (default is 1MB) -b --buffersize <BUFFERSIZE> size of buffer per-thread to use when reading (default is 1MB)
-t --threadcount <THREADCOUNT> number of threads to use (defaults to number of cpu cores) -t --threadcount <THREADCOUNT> number of threads to use (defaults to number of cpu cores)
@ -54,16 +54,16 @@ Cygwin
### Testing ### Testing
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space. Testing cmake against `wc -l` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
To run the test, run `compare_testfiles.sh`. This will generate output as such: To run the test, run `compare_testfiles.sh`. This will generate output as such:
``` ```
Timing for tlc Timing for tlc
lc: test_10MB.txt 0.006s tlc: test_10MB.txt 0.006s
lc: test_100MB.txt 0.015s tlc: test_100MB.txt 0.015s
lc: test_1GB.txt 0.127s tlc: test_1GB.txt 0.127s
lc: test_10GB.txt 1.196s tlc: test_10GB.txt 1.196s
Timing for python Timing for python
python: test_10MB.txt 0.025s python: test_10MB.txt 0.025s
python: test_100MB.txt 0.084s python: test_100MB.txt 0.084s
@ -85,9 +85,11 @@ Performance on Windows and Mac OS X is excellent for all file sizes. Performance
* 1TB SSD hard drive * 1TB SSD hard drive
* 16GB Memory * 16GB Memory
| File Size | `tlc` | `python` | `wc -l` | ```
|-----------|---|---|---|---|---| | File Size | `tlc` | `python` | `wc -l` |
| 10MB | 0.006s | 0.025s (4.2x) | 0.012s (2.0x) | |-----------|--------|----------------|----------------|
| 100MB | 0.015s | 0.084s (5.6x) | 0.100s (6.7x) | | 10MB | 0.006s | 0.025s (4.2x) | 0.012s (2.0x) |
| 1GB | 0.127s | 0.661s (5.2x) | 0.933s (7.3x) | | 100MB | 0.015s | 0.084s (5.6x) | 0.100s (6.7x) |
| 10GB | 1.196s | 6.165s (5.15x) | 9.857s (8.2x) | | 1GB | 0.127s | 0.661s (5.2x) | 0.933s (7.3x) |
| 10GB | 1.196s | 6.165s (5.15x) | 9.857s (8.2x) |
```