3565b2ff01
The following major changes were made: - Create the `./data_science` directory - Add the *Tidy Data* pdf - Create/update the `./data_science` README with the article information (including the scroll icon, link to the source, author, and publication year) Decisions: - Since no relevant folder existed, I created the `./data_science` directory. This is a broad subject, but until the number of articles get to be unmanageable, I think that keeping them together will help people find what they're interested in. - The README does have a sub-category list (*Tidy Data* is under "Data Cleaning"), but there is no corresponding subdirectory. This is because there are few enough raw articles that someone browsing the directory listing won't benefit from the subfolder (but it will cost them an extra click), but someone skimming the README might want to know more about the article categorization. - The README listing includes scroll/title/author/link to source, but it does not include any abstract/rationale. The different READMEs take different approaches here, but this seems to be the best trade-off between a concise listing and providing useful information. I'm happy to add a rationale or summary if it would be useful though. Paper Rationale: This paper describes a subset of data cleaning that had previously been largely neglected: data tidying, or the process of reforming data into a standardized structure to allow for easier manipulation and the development of better data tools. The author is prominent in the data science community, being the chief scientist at RStudio, having authored a number of highly-regarded and very popular data science packages (ex. `ggplot2` and `reshape2`). He was named a Fellow by the American Statistical Association in 2015 for "pivotal contributions to statistical practice through innovative and pioneering research in statistical graphics and computing." For more on Hadley Wickham, see his website: http://hadley.nz/ This is a fairly popular paper as well; according to jstatsoft, it has nearly 50k views. I've seen it mentioned in several other popular media as well, including John Hopkin's very popular online Data Science MOOC. The main reason that I'm adding this paper, however, is because of how well-written it is. I don't come from a data science background, but after reading this paper, I walked away with a decent understanding of the significance of Wickham's research and standardization efforts, the current (circa 2014) state of the field, and many of the technical details associated with his method of data tidying. It was easy to read, despite my lacking data science background, but it's clear that Wickham did not "dumb down" the content in order to accomplish that. I believe that other chapters and independent readers will find this to be an interesting, enjoyable paper, and I believe that it will continue to affect the field of data cleaning. *This paper will be presented at the October meeting of Papers We Love Washington, D.C. & Northern VA.* Copyright Information: The raw paper can be legally included in this repository. *Tidy Data* falls under the [Creative Commons Attribution 3.0 Unported License], which allows for sharing and adaptation with attribution. [Creative Commons Attribution 3.0 Unported License]: https://creativecommons.org/licenses/by/3.0/ |
||
---|---|---|
_meetups/SanFrancisco | ||
.github | ||
android | ||
api_design | ||
artificial_intelligence | ||
audio_comp_sci | ||
biocomputing | ||
caching | ||
clustering_algorithms | ||
combinatory_logic | ||
comp_sci_fundamentals_and_history | ||
computational_creativity | ||
computer_architecture | ||
computer_education | ||
computer_graphics | ||
computer_vision | ||
concurrency | ||
crash_only | ||
cryptography | ||
data_compression | ||
data_fusion | ||
data_replication | ||
data_science | ||
data_structures | ||
datastores | ||
design | ||
digital_currency | ||
distributed_systems | ||
economics | ||
ethics | ||
experimental_algorithmics | ||
gamification | ||
garbage_collection | ||
gossip | ||
information_retrieval | ||
information_theory | ||
languages | ||
logic_and_programming | ||
machine_learning | ||
macros | ||
memory_management | ||
networks | ||
non_blocking_algorithms | ||
operating_systems | ||
organizational_simulation | ||
paradigms | ||
pattern_matching | ||
physics | ||
plt | ||
processes | ||
program_verification | ||
quantum_computing | ||
robotics | ||
security | ||
speech_recognition | ||
sports_analytics | ||
stringology | ||
sublinear_algorithms | ||
testing | ||
time_series | ||
user_interfaces | ||
virtual_machines | ||
.gitignore | ||
CODE_OF_CONDUCT.md | ||
README.md |
Papers We Love (PWL) is a community built around reading, discussing and learning more about academic computer science papers. This repository serves as a directory of some of the best papers the community can find, bringing together documents scattered across the web. You can also visit the Papers We Love site for more info.
Due to licenses we cannot always host the papers themselves (when we do, you will see a 📜 emoji next to its title in the directory README) but we can provide links to their locations.
If you enjoy the papers, perhaps stop by a local chapter meetup and join in on the vibrant discussions around them. You can also discuss PWL events, the content in this repository, and/or anything related to PWL on our Slack, after signing-up to join it, or on our #paperswelove IRC channel on freenode.
Chapters
Here are our official chapters. Let us know if you are interested in starting one in your city!
- Amsterdam
- Bangalore
- Berlin
- Boston
- Brasilia
- Boulder
- Bucharest
- Chattanooga
- Columbus, Ohio
- Dallas
- Hamburg
- Hyderabad
- Iasi
- Kathmandu
- London
- Los Angeles
- Madrid
- Montreal
- Munich
- New York City
- Paris
- Philadelphia
- Portland
- Pune
- Reykjavík
- San Francisco
- Seattle
- Seoul, Korea
- Singapore
- St. Louis
- Toronto
- Vienna
- Washington, DC
- Winnipeg
All of our meetups follow our Code of Conduct.
Past Presentations
Check out our Youtube and MixCloud (audio-only format) channels.
Info
We're looking for pull requests related to papers we should add, better organization of the papers we do have, and/or links to other paper-repos we should point to.
Other Good Places to Find Papers
- Bell System Technical Journal, 1922-1983
- Best Paper Awards in Computer Science
- Google Scholar (choose a subcategory)
- Microsoft Research
- Functional Programming Books Review
- MIT's Artificial Intelligence Lab Publications
- MIT's Distributed System's Reading Group
- arXiv Paper Repository
- SciRate
- cat-v.org
- y-archive
- netlib
- Services Engineering Reading List
- Readings in Distributed Systems
- Gradual Typing Bibliography
- Security Data Science Papers
- Research Papers from Robert Harper, Carnegie Mellon University
- Lobste.rs tagged as PDF
- The Morning Paper
Please check out our wiki-page for links to blogs, books, exchanges that are worth a good read.
How To Read a Paper
Reading a paper is not the same as reading a blogpost or a novel. Here are a few handy resources to help you get started.
- How to read an academic article
- Advice on reading academic papers
- How to read and understand a scientific paper
- Should I Read Papers?
- The Refreshingly Rewarding Realm of Research Papers
Applications/Ideas built around Papers We Love
- Love a Paper - @lovepaper
Contributing Guidelines
Please take a look at our CONTRIBUTING.md file.
Copyright
The name "Papers We Love" and the logos for the organization are copyrighted, and under the ownership of Papers We Love Ltd, all rights reserved. When starting a chapter, please review our guidelines and ask us about using the logo.