diff --git a/README.md b/README.md index f50c982..91e9310 100644 --- a/README.md +++ b/README.md @@ -23,19 +23,15 @@ Here are our official chapters. Let us know if you are interested in [starting o * [Bhubaneswar](https://www.facebook.com/groups/pwlbbsr/) * [Boston](http://www.meetup.com/Papers-We-Love-Boston-Cambridge/) * [Brasilia](http://www.meetup.com/papers-we-love-bsb) -* [Boulder](http://www.meetup.com/Papers-We-Love-Boulder/) * [Bucharest](http://www.meetup.com/papers-we-love-bucharest/) * [Buenos Aires](https://paperswelove.org/buenos-aires/) * [Cairo](http://www.meetup.com/Papers-We-Love-Cairo/) * [Chattanooga](http://www.meetup.com/Papers-We-Love-Chattanooga/) * [Chicago](http://www.meetup.com/papers-we-love-chicago/) * [Columbus, Ohio](http://www.meetup.com/Papers-We-Love-Columbus/) -* [Dallas](http://www.papersdallas.com/) * [Gothenburg](https://www.meetup.com/Papers-We-Love-Gothenburg/) -* [Guadalajara](https://www.facebook.com/pwlgdl/) * [Hamburg](http://www.meetup.com/Papers-We-Love-Hamburg/) * [Hyderabad](http://www.meetup.com/papers-we-love-hyderabad/) -* [Iasi](http://www.meetup.com/Papers-We-Love-Iasi/) * [Iowa City](https://www.meetup.com/techcorridorio) * [Kathmandu](https://www.facebook.com/groups/PapersWeLoveKathmandu/) * [Kyiv](https://www.facebook.com/groups/PapersWeLoveKyiv) @@ -43,18 +39,11 @@ Here are our official chapters. Let us know if you are interested in [starting o * [London](http://www.meetup.com/papers-we-love-london) * [Los Angeles](http://www.meetup.com/papers-we-love-la) * [Madrid](http://www.meetup.com/Papers-We-Love-Madrid/) -* [Medellín](https://www.meetup.com/paperswelovemde/) * [Montreal](http://www.meetup.com/Papers-We-Love-Montreal/) -* [Mumbai](https://www.meetup.com/Papers-We-Love-Mumbai/) -* [Munich](http://www.meetup.com/Papers-We-Love-Munich/) * [New York City](http://www.meetup.com/papers-we-love/) * [Paris](http://www.meetup.com/Papers-We-Love-Paris/) -* [Philadelphia](http://www.meetup.com/Papers-We-Love-Philadelphia/) -* [Portland](http://www.meetup.com/Papers-We-Love-PDX/) -* [Porto](https://www.meetup.com/Papers-We-Love-Porto) * [Pune](http://www.meetup.com/Doo-Things) * [Raleigh-Durham](https://www.meetup.com/Papers-We-Love-Raleigh-Durham/) -* [Reykjavík](http://www.meetup.com/Papers-We-Love-Reykjavik) * [Rio de Janeiro](https://www.meetup.com/pt-BR/papers-we-love-rio-de-janeiro/) * [San Diego](http://www.meetup.com/Papers-We-Love-San-Diego/) * [San Francisco](http://www.meetup.com/papers-we-love-too/) @@ -119,6 +108,18 @@ Reading a paper is not the same as reading a blogpost or a novel. Here are a few * Love a Paper - [@loveapaper](https://twitter.com/loveapaper) +### Download papers + +Open your favourite terminal and run: + +```bash +$ ./scripts/download.sh +``` + +This will scrape markdown files for links to PDFs and download papers to their respective directories. + +See [README.md](./scripts/README.md) for more options. + ## Contributing Guidelines Please take a look at our [CONTRIBUTING.md](https://github.com/papers-we-love/papers-we-love/blob/master/.github/CONTRIBUTING.md) file. diff --git a/machine_learning/README.md b/machine_learning/README.md index c3b94b1..10f6140 100644 --- a/machine_learning/README.md +++ b/machine_learning/README.md @@ -3,7 +3,7 @@ ## External Papers -* [Top 10 algorithms in data mining](http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf) +* [Top 10 algorithms in data mining](https://www.researchgate.net/publication/29467751_Top_10_algorithms_in_data_mining) While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 0000000..bb4e7e5 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,22 @@ +# Scripts + +Scripts for working with repository content. + +## Download Utility +A convenience script to download papers. This will scrape the README.md files for URLs containing links to pdfs and download them to their respective directories. + +The download utility is idempotent and can be run multiple times safely. + +### Usage +Open your favourite terminal and run: + +```bash +$ ./scripts/download.sh +``` + + +Optionally, to download specific topics specify their directories as arguments: + +```bash +$ ./scripts/download.sh android concurrency +``` diff --git a/scripts/download.sh b/scripts/download.sh new file mode 100755 index 0000000..d5139d4 --- /dev/null +++ b/scripts/download.sh @@ -0,0 +1,46 @@ +#!/bin/bash + +# Guard clause check if required binaries are installed +which wget > /dev/null || { echo "Error: wget not installed." ; exit 1 ; } +which egrep > /dev/null || { echo "Error: egrep not installed." ; exit 1 ; } + +# Recursively traverse directories in repo scraping markdown file for URLs +# containing pdfs. Downloads pdfs into respective directories. +download_for_directory() { + cd $1 || { echo "Error: directory not found." ; exit 1 ; } + + for f in *; do + if [[ -d ${f} ]]; then + download_for_directory ${f} & + fi + done + + # Scrape URLs from markdown files + urls=$(ls | cat *.md 2> /dev/null | egrep -o 'https?://[^ ]+' | grep '\.pdf' | tr -d ')') + + for url in "$urls"; do + # Ignore empty URLs + if [[ ! -z ${url} ]]; then + wget ${url} --no-clobber --quiet --timeout=5 --tries=2 + fi + done + + cd .. + echo "$1 done." +} + +# If no directories are supplied, iterate over the entire repo. +if [[ "$#" -eq 0 ]]; then + REPO_ROOT_DIR="$(dirname $0)/.." + download_for_directory ${REPO_ROOT_DIR} +else +# Iterate over the specified directories + for dir in "$@" + do + download_for_directory ${dir} + done +fi + +# Wait for child processes to terminate +wait +