This commit is contained in:
Fares Salem 2020-03-28 16:37:19 +02:00
commit 21a3ee7204
4 changed files with 81 additions and 12 deletions

View File

@ -23,19 +23,15 @@ Here are our official chapters. Let us know if you are interested in [starting o
* [Bhubaneswar](https://www.facebook.com/groups/pwlbbsr/)
* [Boston](http://www.meetup.com/Papers-We-Love-Boston-Cambridge/)
* [Brasilia](http://www.meetup.com/papers-we-love-bsb)
* [Boulder](http://www.meetup.com/Papers-We-Love-Boulder/)
* [Bucharest](http://www.meetup.com/papers-we-love-bucharest/)
* [Buenos Aires](https://paperswelove.org/buenos-aires/)
* [Cairo](http://www.meetup.com/Papers-We-Love-Cairo/)
* [Chattanooga](http://www.meetup.com/Papers-We-Love-Chattanooga/)
* [Chicago](http://www.meetup.com/papers-we-love-chicago/)
* [Columbus, Ohio](http://www.meetup.com/Papers-We-Love-Columbus/)
* [Dallas](http://www.papersdallas.com/)
* [Gothenburg](https://www.meetup.com/Papers-We-Love-Gothenburg/)
* [Guadalajara](https://www.facebook.com/pwlgdl/)
* [Hamburg](http://www.meetup.com/Papers-We-Love-Hamburg/)
* [Hyderabad](http://www.meetup.com/papers-we-love-hyderabad/)
* [Iasi](http://www.meetup.com/Papers-We-Love-Iasi/)
* [Iowa City](https://www.meetup.com/techcorridorio)
* [Kathmandu](https://www.facebook.com/groups/PapersWeLoveKathmandu/)
* [Kyiv](https://www.facebook.com/groups/PapersWeLoveKyiv)
@ -43,18 +39,11 @@ Here are our official chapters. Let us know if you are interested in [starting o
* [London](http://www.meetup.com/papers-we-love-london)
* [Los Angeles](http://www.meetup.com/papers-we-love-la)
* [Madrid](http://www.meetup.com/Papers-We-Love-Madrid/)
* [Medellín](https://www.meetup.com/paperswelovemde/)
* [Montreal](http://www.meetup.com/Papers-We-Love-Montreal/)
* [Mumbai](https://www.meetup.com/Papers-We-Love-Mumbai/)
* [Munich](http://www.meetup.com/Papers-We-Love-Munich/)
* [New York City](http://www.meetup.com/papers-we-love/)
* [Paris](http://www.meetup.com/Papers-We-Love-Paris/)
* [Philadelphia](http://www.meetup.com/Papers-We-Love-Philadelphia/)
* [Portland](http://www.meetup.com/Papers-We-Love-PDX/)
* [Porto](https://www.meetup.com/Papers-We-Love-Porto)
* [Pune](http://www.meetup.com/Doo-Things)
* [Raleigh-Durham](https://www.meetup.com/Papers-We-Love-Raleigh-Durham/)
* [Reykjavík](http://www.meetup.com/Papers-We-Love-Reykjavik)
* [Rio de Janeiro](https://www.meetup.com/pt-BR/papers-we-love-rio-de-janeiro/)
* [San Diego](http://www.meetup.com/Papers-We-Love-San-Diego/)
* [San Francisco](http://www.meetup.com/papers-we-love-too/)
@ -119,6 +108,18 @@ Reading a paper is not the same as reading a blogpost or a novel. Here are a few
* Love a Paper - [@loveapaper](https://twitter.com/loveapaper)
### Download papers
Open your favourite terminal and run:
```bash
$ ./scripts/download.sh
```
This will scrape markdown files for links to PDFs and download papers to their respective directories.
See [README.md](./scripts/README.md) for more options.
## Contributing Guidelines
Please take a look at our [CONTRIBUTING.md](https://github.com/papers-we-love/papers-we-love/blob/master/.github/CONTRIBUTING.md) file.

View File

@ -3,7 +3,7 @@
## External Papers
* [Top 10 algorithms in data mining](http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf)
* [Top 10 algorithms in data mining](https://www.researchgate.net/publication/29467751_Top_10_algorithms_in_data_mining)
While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms

22
scripts/README.md Normal file
View File

@ -0,0 +1,22 @@
# Scripts
Scripts for working with repository content.
## Download Utility
A convenience script to download papers. This will scrape the README.md files for URLs containing links to pdfs and download them to their respective directories.
The download utility is idempotent and can be run multiple times safely.
### Usage
Open your favourite terminal and run:
```bash
$ ./scripts/download.sh
```
Optionally, to download specific topics specify their directories as arguments:
```bash
$ ./scripts/download.sh android concurrency
```

46
scripts/download.sh Executable file
View File

@ -0,0 +1,46 @@
#!/bin/bash
# Guard clause check if required binaries are installed
which wget > /dev/null || { echo "Error: wget not installed." ; exit 1 ; }
which egrep > /dev/null || { echo "Error: egrep not installed." ; exit 1 ; }
# Recursively traverse directories in repo scraping markdown file for URLs
# containing pdfs. Downloads pdfs into respective directories.
download_for_directory() {
cd $1 || { echo "Error: directory not found." ; exit 1 ; }
for f in *; do
if [[ -d ${f} ]]; then
download_for_directory ${f} &
fi
done
# Scrape URLs from markdown files
urls=$(ls | cat *.md 2> /dev/null | egrep -o 'https?://[^ ]+' | grep '\.pdf' | tr -d ')')
for url in "$urls"; do
# Ignore empty URLs
if [[ ! -z ${url} ]]; then
wget ${url} --no-clobber --quiet --timeout=5 --tries=2
fi
done
cd ..
echo "$1 done."
}
# If no directories are supplied, iterate over the entire repo.
if [[ "$#" -eq 0 ]]; then
REPO_ROOT_DIR="$(dirname $0)/.."
download_for_directory ${REPO_ROOT_DIR}
else
# Iterate over the specified directories
for dir in "$@"
do
download_for_directory ${dir}
done
fi
# Wait for child processes to terminate
wait