Added script to download all (pdf) papers locally (#597)

* Added script to download all PDFs from the Readmes

* Removed sleep

* Formatting

* Added guard closes and some docs to download script. Added it to scripts folder. Added download script readme. Added section in root readme.

* Removed old download_all.sh

* Added support for specifying which directories you want to download.

* Removed dependency on xargs.

* Changed filename to download.sh. Updated READMEs.

* More README

* Fixed download.sh logic for multiple arguments. Removed Readme section about executing script from anywhere. Updated the parsing of URLs to be more specific.
pull/599/head
christoshadjiaslanis 4 years ago committed by GitHub
parent c1debdd00d
commit d8c4b140a2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -108,6 +108,18 @@ Reading a paper is not the same as reading a blogpost or a novel. Here are a few
* Love a Paper - [@loveapaper](https://twitter.com/loveapaper)
### Download papers
Open your favourite terminal and run:
```bash
$ ./scripts/download.sh
```
This will scrape markdown files for links to PDFs and download papers to their respective directories.
See [README.md](./scripts/README.md) for more options.
## Contributing Guidelines
Please take a look at our [CONTRIBUTING.md](https://github.com/papers-we-love/papers-we-love/blob/master/.github/CONTRIBUTING.md) file.

@ -0,0 +1,22 @@
# Scripts
Scripts for working with repository content.
## Download Utility
A convenience script to download papers. This will scrape the README.md files for URLs containing links to pdfs and download them to their respective directories.
The download utility is idempotent and can be run multiple times safely.
### Usage
Open your favourite terminal and run:
```bash
$ ./scripts/download.sh
```
Optionally, to download specific topics specify their directories as arguments:
```bash
$ ./scripts/download.sh android concurrency
```

@ -0,0 +1,46 @@
#!/bin/bash
# Guard clause check if required binaries are installed
which wget > /dev/null || { echo "Error: wget not installed." ; exit 1 ; }
which egrep > /dev/null || { echo "Error: egrep not installed." ; exit 1 ; }
# Recursively traverse directories in repo scraping markdown file for URLs
# containing pdfs. Downloads pdfs into respective directories.
download_for_directory() {
cd $1 || { echo "Error: directory not found." ; exit 1 ; }
for f in *; do
if [[ -d ${f} ]]; then
download_for_directory ${f} &
fi
done
# Scrape URLs from markdown files
urls=$(ls | cat *.md 2> /dev/null | egrep -o 'https?://[^ ]+' | grep '\.pdf' | tr -d ')')
for url in "$urls"; do
# Ignore empty URLs
if [[ ! -z ${url} ]]; then
wget ${url} --no-clobber --quiet --timeout=5 --tries=2
fi
done
cd ..
echo "$1 done."
}
# If no directories are supplied, iterate over the entire repo.
if [[ "$#" -eq 0 ]]; then
REPO_ROOT_DIR="$(dirname $0)/.."
download_for_directory ${REPO_ROOT_DIR}
else
# Iterate over the specified directories
for dir in "$@"
do
download_for_directory ${dir}
done
fi
# Wait for child processes to terminate
wait
Loading…
Cancel
Save