mirror of
https://github.com/crioux/turbo-linecount.git
synced 2024-10-27 17:24:01 +00:00
readme
This commit is contained in:
parent
453d2bf46e
commit
bf55e8383f
40
README.md
40
README.md
@ -1,23 +1,24 @@
|
|||||||
# linecount
|
# turbo-linecount
|
||||||
linecount 1.0 Copyright 2015, Christien Rioux
|
turbo-linecount 1.0 Copyright 2015, Christien Rioux
|
||||||
|
|
||||||
### Super-Fast Multi-Threaded Line Counter
|
### Super-Fast Multi-Threaded Line Counter
|
||||||
|
|
||||||
*linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
|
*turbo-linecount* is a tool that simply counts the number of lines in a file, as fast as possible. It reads the file in large chunks into several threads and quickly scans the file for line endings.
|
||||||
|
|
||||||
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access.
|
Many times, you have to count the number of lines in text file on disk. The typical solution is to use 'wc -l' on the command line. 'wc' uses buffered streams to process the file, which has its advantages, but it is slower than direct memory mapped file access. You can't 'pipe' to
|
||||||
|
|
||||||
How much faster is *linecount*? About 10 times faster than `wc` and 5 times faster than the naive Python implementation.
|
How much faster is *turbo-linecount*? About 8 times faster than `wc` and 5 times faster than the naive Python implementation.
|
||||||
|
|
||||||
To use *linecount*, just run the command line:
|
To use *turbo-linecount*, just run the command line:
|
||||||
|
|
||||||
```
|
```
|
||||||
lc <file>
|
lc <file>
|
||||||
```
|
```
|
||||||
|
|
||||||
where *\<file\>* is the path to the file of which you'd like to count the lines.
|
where *\<file\>* is the path to the file of which you'd like to count the lines.
|
||||||
|
|
||||||
###Help
|
###Help
|
||||||
To get help with *linecount*:
|
To get help with *turbo-linecount*:
|
||||||
|
|
||||||
```
|
```
|
||||||
lc -h
|
lc -h
|
||||||
@ -30,8 +31,9 @@ usage: lc [options] <file>
|
|||||||
|
|
||||||
###Building
|
###Building
|
||||||
|
|
||||||
To build *linecount*, we use *cmake*. Cmake 3.3.0 is the preferred version as of this release. For building just the command line utility on Windows, a Visual Studio 2013 solution file is also included.
|
To build *turbo-linecount*, we use *cmake*. Cmake 3.0.0 or higher is the preferred version as of this release. For simplified building on Windows, a Visual Studio 2013 solution file is also included.
|
||||||
|
|
||||||
|
To build with *cmake*:
|
||||||
```
|
```
|
||||||
cd build
|
cd build
|
||||||
cmake ..
|
cmake ..
|
||||||
@ -39,24 +41,30 @@ make
|
|||||||
make install
|
make install
|
||||||
```
|
```
|
||||||
|
|
||||||
This will build and install the command line utility `lc`, a shared library `liblinecount`, a static library `liblinecount_static`, and a header file `linecount.h`.
|
This will build and install the command line utility `tlc`, a shared library `libturbo_linecount`, a static library `libturbo_linecount_static`, and a header file `turbo_linecount.h`.
|
||||||
|
|
||||||
Building *linecount* is known to be possible on
|
Building *turbo-linecount* is known to be possible on
|
||||||
|
|
||||||
```
|
```
|
||||||
Windows 32/64 bit
|
Windows 32/64 bit
|
||||||
Mac OS X
|
Mac OS X
|
||||||
Linux
|
Linux
|
||||||
|
Cygwin
|
||||||
```
|
```
|
||||||
|
|
||||||
###Testing
|
### Testing
|
||||||
|
|
||||||
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
|
Testing cmake against `wc` and `python` can be done with the test scripts. To generate some random test files, run `create_testfiles.sh`, and four test files, one 10MB, one 100MB, one 1GB, and one 10GB file will be created. Feel free to delete these when you're done testing to save space.
|
||||||
|
|
||||||
To run the test, run `compare_testfiles.sh`. This will generate output as such:
|
To run the test, run `compare_testfiles.sh`. This will generate output as such:
|
||||||
|
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
Performance on Windows and Mac OS X is excellent for all file sizes. Performance on Linux and other operating systems is good, but can be better. Stay tuned.
|
||||||
|
|
||||||
```
|
```
|
||||||
Timing for lc
|
Timing for tlc
|
||||||
lc: test_10MB.txt 0.006s
|
lc: test_10MB.txt 0.006s
|
||||||
lc: test_100MB.txt 0.015s
|
lc: test_100MB.txt 0.015s
|
||||||
lc: test_1GB.txt 0.127s
|
lc: test_1GB.txt 0.127s
|
||||||
@ -71,4 +79,10 @@ wc: test_10MB.txt 0.012s
|
|||||||
wc: test_100MB.txt 0.100s
|
wc: test_100MB.txt 0.100s
|
||||||
wc: test_1GB.txt 0.933s
|
wc: test_1GB.txt 0.933s
|
||||||
wc: test_10GB.txt 9.857s
|
wc: test_10GB.txt 9.857s
|
||||||
```
|
```
|
||||||
|
|
||||||
|
| | | | | |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
||||||
|
| | | | | |
|
@ -1,12 +1,14 @@
|
|||||||
#!/bin/sh
|
#!/bin/sh
|
||||||
|
|
||||||
|
if [ "$1" = "" ]; then
|
||||||
|
echo "specify path to tlc binary"
|
||||||
|
exit 1
|
||||||
|
else
|
||||||
|
TLC=$1
|
||||||
|
fi
|
||||||
|
|
||||||
tlctest()
|
tlctest()
|
||||||
{
|
{
|
||||||
TLC=tlc
|
|
||||||
if [ -f ./tlc ]; then
|
|
||||||
TLC=./tlc
|
|
||||||
fi
|
|
||||||
|
|
||||||
OUT=`(time $TLC $1) 2>&1 | grep real | cut -f 2 | cut -c 3-`
|
OUT=`(time $TLC $1) 2>&1 | grep real | cut -f 2 | cut -c 3-`
|
||||||
echo "tlc: $1 $OUT"
|
echo "tlc: $1 $OUT"
|
||||||
|
Loading…
Reference in New Issue
Block a user