Merge branch 'master' of https://github.com/papers-we-love/papers-we-love

2025-06-13 12:54:28 +00:00 · 2020-03-28 16:37:19 +02:00 · 2020-03-28 16:37:19 +02:00 · 21a3ee7204
commit 21a3ee7204
parent 87eba1f4b4 d8c4b140a2
4 changed files with 81 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -23,19 +23,15 @@ Here are our official chapters. Let us know if you are interested in [starting o
 * [Bhubaneswar](https://www.facebook.com/groups/pwlbbsr/)
 * [Boston](http://www.meetup.com/Papers-We-Love-Boston-Cambridge/)
 * [Brasilia](http://www.meetup.com/papers-we-love-bsb)
-* [Boulder](http://www.meetup.com/Papers-We-Love-Boulder/)
 * [Bucharest](http://www.meetup.com/papers-we-love-bucharest/)
 * [Buenos Aires](https://paperswelove.org/buenos-aires/)
 * [Cairo](http://www.meetup.com/Papers-We-Love-Cairo/)
 * [Chattanooga](http://www.meetup.com/Papers-We-Love-Chattanooga/)
 * [Chicago](http://www.meetup.com/papers-we-love-chicago/)
 * [Columbus, Ohio](http://www.meetup.com/Papers-We-Love-Columbus/)
-* [Dallas](http://www.papersdallas.com/)
 * [Gothenburg](https://www.meetup.com/Papers-We-Love-Gothenburg/)
-* [Guadalajara](https://www.facebook.com/pwlgdl/)
 * [Hamburg](http://www.meetup.com/Papers-We-Love-Hamburg/)
 * [Hyderabad](http://www.meetup.com/papers-we-love-hyderabad/)
-* [Iasi](http://www.meetup.com/Papers-We-Love-Iasi/)
 * [Iowa City](https://www.meetup.com/techcorridorio)
 * [Kathmandu](https://www.facebook.com/groups/PapersWeLoveKathmandu/)
 * [Kyiv](https://www.facebook.com/groups/PapersWeLoveKyiv)
@ -43,18 +39,11 @@ Here are our official chapters. Let us know if you are interested in [starting o
 * [London](http://www.meetup.com/papers-we-love-london)
 * [Los Angeles](http://www.meetup.com/papers-we-love-la)
 * [Madrid](http://www.meetup.com/Papers-We-Love-Madrid/)
-* [Medellín](https://www.meetup.com/paperswelovemde/)
 * [Montreal](http://www.meetup.com/Papers-We-Love-Montreal/)
-* [Mumbai](https://www.meetup.com/Papers-We-Love-Mumbai/)
-* [Munich](http://www.meetup.com/Papers-We-Love-Munich/)
 * [New York City](http://www.meetup.com/papers-we-love/)
 * [Paris](http://www.meetup.com/Papers-We-Love-Paris/)
-* [Philadelphia](http://www.meetup.com/Papers-We-Love-Philadelphia/)
-* [Portland](http://www.meetup.com/Papers-We-Love-PDX/)
-* [Porto](https://www.meetup.com/Papers-We-Love-Porto)
 * [Pune](http://www.meetup.com/Doo-Things)
 * [Raleigh-Durham](https://www.meetup.com/Papers-We-Love-Raleigh-Durham/)
-* [Reykjavík](http://www.meetup.com/Papers-We-Love-Reykjavik)
 * [Rio de Janeiro](https://www.meetup.com/pt-BR/papers-we-love-rio-de-janeiro/)
 * [San Diego](http://www.meetup.com/Papers-We-Love-San-Diego/)
 * [San Francisco](http://www.meetup.com/papers-we-love-too/)
@ -119,6 +108,18 @@ Reading a paper is not the same as reading a blogpost or a novel. Here are a few

 * Love a Paper - [@loveapaper](https://twitter.com/loveapaper)

+### Download papers
+
+Open your favourite terminal and run:
+
+```bash
+$ ./scripts/download.sh
+```
+
+This will scrape markdown files for links to PDFs and download papers to their respective directories.
+
+See [README.md](./scripts/README.md) for more options.
+
 ## Contributing Guidelines

 Please take a look at our [CONTRIBUTING.md](https://github.com/papers-we-love/papers-we-love/blob/master/.github/CONTRIBUTING.md) file.
--- a/machine_learning/README.md
+++ b/machine_learning/README.md
@ -3,7 +3,7 @@

 ## External Papers

-* [Top 10 algorithms in data mining](http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf)
+* [Top 10 algorithms in data mining](https://www.researchgate.net/publication/29467751_Top_10_algorithms_in_data_mining)

  While it is difficult to identify the top 10, this paper contains 10 very important data mining/machine learning algorithms

--- a/scripts/README.md
+++ b/scripts/README.md
@ -0,0 +1,22 @@
+# Scripts
+
+Scripts for working with repository content.
+
+## Download Utility
+A convenience script to download papers. This will scrape the README.md files for URLs containing links to pdfs and download them to their respective directories.
+
+The download utility is idempotent and can be run multiple times safely.
+
+### Usage
+Open your favourite terminal and run:
+
+```bash
+$ ./scripts/download.sh
+```
+
+
+Optionally, to download specific topics specify their directories as arguments:
+
+```bash
+$ ./scripts/download.sh android concurrency
+```
--- a/scripts/download.sh
+++ b/scripts/download.sh
@ -0,0 +1,46 @@
+#!/bin/bash
+
+# Guard clause check if required binaries are installed
+which wget > /dev/null || { echo "Error: wget not installed." ; exit 1 ; }
+which egrep > /dev/null || { echo "Error: egrep not installed." ; exit 1 ; }
+
+# Recursively traverse directories in repo scraping markdown file for URLs
+# containing pdfs. Downloads pdfs into respective directories.
+download_for_directory() {
+    cd $1 || { echo "Error: directory not found." ; exit 1 ; }
+
+    for f in *; do
+        if [[ -d ${f} ]]; then
+            download_for_directory ${f} &
+        fi
+    done
+
+    # Scrape URLs from markdown files
+    urls=$(ls | cat *.md 2> /dev/null | egrep -o 'https?://[^ ]+' | grep '\.pdf' | tr -d ')')
+
+    for url in "$urls"; do
+        # Ignore empty URLs
+        if [[ ! -z ${url} ]]; then
+            wget ${url} --no-clobber --quiet --timeout=5 --tries=2
+        fi
+    done
+    
+    cd ..
+    echo "$1 done."
+}
+
+# If no directories are supplied, iterate over the entire repo.
+if [[ "$#" -eq 0 ]]; then
+    REPO_ROOT_DIR="$(dirname $0)/.."
+    download_for_directory ${REPO_ROOT_DIR}
+else
+# Iterate over the specified directories
+    for dir in "$@"
+    do
+        download_for_directory ${dir}
+    done
+fi
+
+# Wait for child processes to terminate
+wait
+