Compare commits

...

5 Commits
1.0 ... master

Author SHA1 Message Date
Christien Rioux
9bf2210d40
Update README.md 2018-04-25 15:54:28 -07:00
Christien Rioux
4ca4808a26 finish sentence 2015-12-13 09:36:06 -08:00
Christien Rioux
edea23afe6 typo 2015-12-13 02:55:10 -08:00
Christien Rioux
7a2a4a601c update readme 2015-12-13 02:11:39 -08:00
Christien Rioux
82062d4e63 update readme 2015-12-13 02:10:47 -08:00

View File

@ -5,14 +5,14 @@ turbo-linecount 1.0 Copyright 2015, Christien Rioux
*turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to
Many times, you have to count the number of lines in text file on disk. The typical solution is to use `wc -l` on the command line. `wc -l` uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to *turbo-linecount* however. This may change in a future release.
How much faster is *turbo-linecount*? About 8 times faster than `wc` and 5 times faster than the naive Python implementation.
How much faster is *turbo-linecount*? About 8 times faster than `wc -l` and 5 times faster than the naive Python implementation.
To use *turbo-linecount*, just run the command line:
```
lc <file>
tlc <file>
```
where *\<file\>* is the path to the file of which you'd like to count the lines.
@ -21,8 +21,8 @@ where *\<file\>* is the path to the file of which you'd like to count the lines.
To get help with *turbo-linecount*:
```
lc -h
usage: lc [options] <file>
tlc -h
usage: tlc [options] <file>
-h --help print this usage and exit
-b --buffersize <BUFFERSIZE> size of buffer per-thread to use when reading (default is 1MB)
-t --threadcount <THREADCOUNT> number of threads to use (defaults to number of cpu cores)
@ -54,16 +54,16 @@ Cygwin
### Testing
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
Testing cmake against `wc -l` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
To run the test, run `compare_testfiles.sh`. This will generate output as such:
```
Timing for tlc
lc: test_10MB.txt 0.006s
lc: test_100MB.txt 0.015s
lc: test_1GB.txt 0.127s
lc: test_10GB.txt 1.196s
tlc: test_10MB.txt 0.006s
tlc: test_100MB.txt 0.015s
tlc: test_1GB.txt 0.127s
tlc: test_10GB.txt 1.196s
Timing for python
python: test_10MB.txt 0.025s
python: test_100MB.txt 0.084s
@ -85,9 +85,11 @@ Performance on Windows and Mac OS X is excellent for all file sizes. Performance
* 1TB SSD hard drive
* 16GB Memory
| File Size | `tlc` | `python` | `wc -l` |
|-----------|---|---|---|---|---|
| 10MB | 0.006s | 0.025s (4.2x) | 0.012s (2.0x) |
| 100MB | 0.015s | 0.084s (5.6x) | 0.100s (6.7x) |
| 1GB | 0.127s | 0.661s (5.2x) | 0.933s (7.3x) |
| 10GB | 1.196s | 6.165s (5.15x) | 9.857s (8.2x) |
```
| File Size | `tlc` | `python` | `wc -l` |
|-----------|--------|----------------|----------------|
| 10MB | 0.006s | 0.025s (4.2x) | 0.012s (2.0x) |
| 100MB | 0.015s | 0.084s (5.6x) | 0.100s (6.7x) |
| 1GB | 0.127s | 0.661s (5.2x) | 0.933s (7.3x) |
| 10GB | 1.196s | 6.165s (5.15x) | 9.857s (8.2x) |
```