papers-we-love_papers-we-love

mirror of https://github.com/papers-we-love/papers-we-love.git synced 2025-06-13 12:54:28 +00:00

awesome computer-science meetup papers programming read-papers theory

Go to file

Lee Sharma 3565b2ff01 Add Tidy Data (2014) to list of articles (#414 ) The following major changes were made: - Create the `./data_science` directory - Add the Tidy Data pdf - Create/update the `./data_science` README with the article information (including the scroll icon, link to the source, author, and publication year) Decisions: - Since no relevant folder existed, I created the `./data_science` directory. This is a broad subject, but until the number of articles get to be unmanageable, I think that keeping them together will help people find what they're interested in. - The README does have a sub-category list (Tidy Data is under "Data Cleaning"), but there is no corresponding subdirectory. This is because there are few enough raw articles that someone browsing the directory listing won't benefit from the subfolder (but it will cost them an extra click), but someone skimming the README might want to know more about the article categorization. - The README listing includes scroll/title/author/link to source, but it does not include any abstract/rationale. The different READMEs take different approaches here, but this seems to be the best trade-off between a concise listing and providing useful information. I'm happy to add a rationale or summary if it would be useful though. Paper Rationale: This paper describes a subset of data cleaning that had previously been largely neglected: data tidying, or the process of reforming data into a standardized structure to allow for easier manipulation and the development of better data tools. The author is prominent in the data science community, being the chief scientist at RStudio, having authored a number of highly-regarded and very popular data science packages (ex. `ggplot2` and `reshape2`). He was named a Fellow by the American Statistical Association in 2015 for "pivotal contributions to statistical practice through innovative and pioneering research in statistical graphics and computing." For more on Hadley Wickham, see his website: http://hadley.nz/ This is a fairly popular paper as well; according to jstatsoft, it has nearly 50k views. I've seen it mentioned in several other popular media as well, including John Hopkin's very popular online Data Science MOOC. The main reason that I'm adding this paper, however, is because of how well-written it is. I don't come from a data science background, but after reading this paper, I walked away with a decent understanding of the significance of Wickham's research and standardization efforts, the current (circa 2014) state of the field, and many of the technical details associated with his method of data tidying. It was easy to read, despite my lacking data science background, but it's clear that Wickham did not "dumb down" the content in order to accomplish that. I believe that other chapters and independent readers will find this to be an interesting, enjoyable paper, and I believe that it will continue to affect the field of data cleaning. This paper will be presented at the October meeting of Papers We Love Washington, D.C. & Northern VA. Copyright Information: The raw paper can be legally included in this repository. Tidy Data falls under the [Creative Commons Attribution 3.0 Unported License], which allows for sharing and adaptation with attribution. [Creative Commons Attribution 3.0 Unported License]: https://creativecommons.org/licenses/by/3.0/		2016-08-30 10:47:06 -04:00
_meetups/SanFrancisco	link to SF meetup summary	2014-10-20 14:56:29 -07:00
.github	running w/ templates and README updates	2016-02-20 01:26:00 -05:00
android	added space after a md formatted link	2014-05-30 13:05:49 -07:00
api_design	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
artificial_intelligence	new nyc papers and such	2016-05-25 12:18:25 -04:00
audio_comp_sci	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
biocomputing	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
caching	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
clustering_algorithms	Update README.md	2014-11-14 11:11:46 -05:00
combinatory_logic	Adds section and paper on combinatory logic. Referenced by William Byrd on twitter.	2014-04-02 10:19:39 -04:00
comp_sci_fundamentals_and_history	create two versions of the same document (#404 )	2016-07-06 18:26:22 -04:00
computational_creativity	add for pwl-nyc april	2016-03-18 14:19:07 -04:00
computer_architecture	add papers referred to by @skamille	2014-07-17 23:03:12 -05:00
computer_education	A Framework for Automated Generation of Questions Across Formal Domains	2015-10-09 13:12:09 +08:00
computer_graphics	add nyc nasser presentation paper and statement	2016-01-15 10:29:52 -05:00
computer_vision	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
concurrency	Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms (#395 )	2016-05-31 07:33:43 -04:00
crash_only	Add crash-only software dir + articles	2015-05-30 09:34:43 -04:00
cryptography	Cryptography, non-blocking algorithm, quantum computing (#409 )	2016-08-18 10:16:55 -04:00
data_compression	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
data_fusion	kalman filter and category (#410 )	2016-08-24 11:18:00 -04:00
data_replication	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
data_science	Add Tidy Data (2014) to list of articles (#414 )	2016-08-30 10:47:06 -04:00
data_structures	Fix markup for data_structures	2015-11-28 18:26:36 +01:00
datastores	reorganized non-hosted datastore papers in the distributed_systems folder	2016-02-21 01:14:57 -05:00
design	add No Silver Bullet to readme	2016-01-31 10:15:56 -08:00
digital_currency	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
distributed_systems	add tiered rep paper (#401 )	2016-06-13 12:06:49 -04:00
economics	Added dates and authors to economics section	2015-10-29 15:31:53 +01:00
ethics	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
experimental_algorithmics	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
gamification	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
garbage_collection	add - haskell gc paper	2015-10-11 14:38:27 -07:00
gossip	Fix broken links	2016-03-22 13:21:38 +08:00
information_retrieval	Adding the HITS Algorithm paper	2016-02-20 15:32:29 -05:00
information_theory	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
languages	Add the Sketch-n-Sketch paper (#412 )	2016-08-29 20:56:18 -07:00
logic_and_programming	spelling error in logic_and_programming readme	2016-02-09 11:35:32 -07:00
machine_learning	Updated Random forests paper location	2016-03-05 15:56:22 +01:00
macros	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
memory_management	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
networks	added paper on SPDY evaluation	2016-03-10 22:13:27 +00:00
non_blocking_algorithms	Cryptography, non-blocking algorithm, quantum computing (#409 )	2016-08-18 10:16:55 -04:00
operating_systems	Update README.md	2016-08-18 10:17:55 -04:00
organizational_simulation	next paper nyc	2016-04-22 00:37:24 +01:00
paradigms	Move paradigms into a paradigms directory (#407 )	2016-07-28 16:11:38 -04:00
pattern_matching	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
physics	add paper for presentation 11/19 nyc pwl	2015-10-20 16:49:38 -04:00
plt	Fixed typos	2016-04-14 17:04:59 +02:00
processes	Fix typo	2016-02-19 12:03:07 -06:00
program_verification	Add new category, program verification.	2014-05-02 13:52:23 -07:00
quantum_computing	Cryptography, non-blocking algorithm, quantum computing (#409 )	2016-08-18 10:16:55 -04:00
robotics	Update README.md	2016-02-25 22:01:00 +01:00
security	Add files via upload	2016-06-08 13:25:52 +02:00
speech_recognition	Added tutorial link back in with new URL as provided by @DarrenN	2015-01-26 15:27:34 +00:00
sports_analytics	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
stringology	add "Fast String Searching"	2015-03-06 07:18:18 -05:00
sublinear_algorithms	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
testing	Move languages into languages dir. Move 'tdd' dir into testing (#403 )	2016-07-10 23:04:17 -04:00
time_series	Update to all READMEs for hosted content	2015-10-07 15:12:22 -04:00
user_interfaces	init #clojurewest papers to research	2014-03-26 10:54:45 -07:00
virtual_machines	fastcommit	2015-07-07 00:04:27 +02:00
.gitignore	primecoin	2013-11-30 12:15:01 -05:00
CODE_OF_CONDUCT.md	Add "age" to Spelling It Out (#408 )	2016-07-26 15:46:05 -04:00
README.md	README.md: Add Washington, DC chapter (#411 )	2016-08-25 14:58:36 -04:00

README.md

Papers We Love (PWL) is a community built around reading, discussing and learning more about academic computer science papers. This repository serves as a directory of some of the best papers the community can find, bringing together documents scattered across the web. You can also visit the Papers We Love site for more info.

Due to licenses we cannot always host the papers themselves (when we do, you will see a 📜 emoji next to its title in the directory README) but we can provide links to their locations.

If you enjoy the papers, perhaps stop by a local chapter meetup and join in on the vibrant discussions around them. You can also discuss PWL events, the content in this repository, and/or anything related to PWL on our Slack, after signing-up to join it, or on our #paperswelove IRC channel on freenode.

Chapters

Here are our official chapters. Let us know if you are interested in starting one in your city!

All of our meetups follow our Code of Conduct.

Past Presentations

Check out our Youtube and MixCloud (audio-only format) channels.

Info

We're looking for pull requests related to papers we should add, better organization of the papers we do have, and/or links to other paper-repos we should point to.

Other Good Places to Find Papers

Please check out our wiki-page for links to blogs, books, exchanges that are worth a good read.

How To Read a Paper

Reading a paper is not the same as reading a blogpost or a novel. Here are a few handy resources to help you get started.

Applications/Ideas built around Papers We Love

Love a Paper - @lovepaper

Contributing Guidelines

Please take a look at our CONTRIBUTING.md file.

Copyright

The name "Papers We Love" and the logos for the organization are copyrighted, and under the ownership of Papers We Love Ltd, all rights reserved. When starting a chapter, please review our guidelines and ask us about using the logo.