papers-we-love_papers-we-love

mirror of https://github.com/papers-we-love/papers-we-love.git synced 2026-03-02 03:40:31 +00:00

Go to file

Lee Sharma 3565b2ff01 Add Tidy Data (2014) to list of articles (#414 )

The following major changes were made:

  - Create the `./data_science` directory

  - Add the *Tidy Data* pdf

  - Create/update the `./data_science` README with the article
    information (including the scroll icon, link to the source, author,
    and publication year)

Decisions:

  - Since no relevant folder existed, I created the `./data_science`
    directory. This is a broad subject, but until the number of articles
    get to be unmanageable, I think that keeping them together will help
    people find what they're interested in.

  - The README does have a sub-category list (*Tidy Data* is under "Data
    Cleaning"), but there is no corresponding subdirectory. This is
    because there are few enough raw articles that someone browsing the
    directory listing won't benefit from the subfolder (but it will cost
    them an extra click), but someone skimming the README might want to
    know more about the article categorization.

  - The README listing includes scroll/title/author/link to source, but
    it does not include any abstract/rationale. The different READMEs
    take different approaches here, but this seems to be the best
    trade-off between a concise listing and providing useful
    information. I'm happy to add a rationale or summary if it would be
    useful though.

Paper Rationale:

  This paper describes a subset of data cleaning that had previously
  been largely neglected: data tidying, or the process of reforming data
  into a standardized structure to allow for easier manipulation and the
  development of better data tools.

  The author is prominent in the data science community, being the chief
  scientist at RStudio, having authored a number of highly-regarded and
  very popular data science packages (ex. `ggplot2` and `reshape2`).
  He was named a Fellow by the American Statistical Association in 2015
  for "pivotal contributions to statistical practice through innovative
  and pioneering research in statistical graphics and computing." For
  more on Hadley Wickham, see his website: http://hadley.nz/

  This is a fairly popular paper as well; according to jstatsoft, it has
  nearly 50k views. I've seen it mentioned in several other popular
  media as well, including John Hopkin's very popular online Data
  Science MOOC.

  The main reason that I'm adding this paper, however, is because of
  how well-written it is. I don't come from a data science background,
  but after reading this paper, I walked away with a decent
  understanding of the significance of Wickham's research and
  standardization efforts, the current (circa 2014) state of the field,
  and many of the technical details associated with his method of data
  tidying. It was easy to read, despite my lacking data science
  background, but it's clear that Wickham did not "dumb down" the
  content in order to accomplish that.

  I believe that other chapters and independent readers will find this
  to be an interesting, enjoyable paper, and I believe that it will
  continue to affect the field of data cleaning.

  *This paper will be presented at the October meeting of Papers We Love
  Washington, D.C. & Northern VA.*

Copyright Information:

  The raw paper can be legally included in this repository. *Tidy Data*
  falls under the [Creative Commons Attribution 3.0 Unported License],
  which allows for sharing and adaptation with attribution.

  [Creative Commons Attribution 3.0 Unported License]:
    https://creativecommons.org/licenses/by/3.0/

2016-08-30 10:47:06 -04:00

_meetups/SanFrancisco

link to SF meetup summary

2014-10-20 14:56:29 -07:00

.github

running w/ templates and README updates

2016-02-20 01:26:00 -05:00

android

added space after a md formatted link

2014-05-30 13:05:49 -07:00

api_design

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

artificial_intelligence

new nyc papers and such

2016-05-25 12:18:25 -04:00

audio_comp_sci

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

biocomputing

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

caching

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

clustering_algorithms

Update README.md

2014-11-14 11:11:46 -05:00

combinatory_logic

Adds section and paper on combinatory logic. Referenced by William Byrd on twitter.

2014-04-02 10:19:39 -04:00

comp_sci_fundamentals_and_history

create two versions of the same document (#404 )

2016-07-06 18:26:22 -04:00

computational_creativity

add for pwl-nyc april

2016-03-18 14:19:07 -04:00

computer_architecture

add papers referred to by @skamille

2014-07-17 23:03:12 -05:00

computer_education

A Framework for Automated Generation of Questions Across Formal Domains

2015-10-09 13:12:09 +08:00

computer_graphics

add nyc nasser presentation paper and statement

2016-01-15 10:29:52 -05:00

computer_vision

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

concurrency

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms (#395 )

2016-05-31 07:33:43 -04:00

crash_only

Add crash-only software dir + articles

2015-05-30 09:34:43 -04:00

cryptography

Cryptography, non-blocking algorithm, quantum computing (#409 )

2016-08-18 10:16:55 -04:00

data_compression

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

data_fusion

kalman filter and category (#410 )

2016-08-24 11:18:00 -04:00

data_replication

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

data_science

Add Tidy Data (2014) to list of articles (#414 )

2016-08-30 10:47:06 -04:00

data_structures

Fix markup for data_structures

2015-11-28 18:26:36 +01:00

datastores

reorganized non-hosted datastore papers in the distributed_systems folder

2016-02-21 01:14:57 -05:00

design

add No Silver Bullet to readme

2016-01-31 10:15:56 -08:00

digital_currency

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

distributed_systems

add tiered rep paper (#401 )

2016-06-13 12:06:49 -04:00

economics

Added dates and authors to economics section

2015-10-29 15:31:53 +01:00

ethics

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

experimental_algorithmics

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

gamification

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

garbage_collection

add - haskell gc paper

2015-10-11 14:38:27 -07:00

gossip

Fix broken links

2016-03-22 13:21:38 +08:00

information_retrieval

Adding the HITS Algorithm paper

2016-02-20 15:32:29 -05:00

information_theory

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

languages

Add the Sketch-n-Sketch paper (#412 )

2016-08-29 20:56:18 -07:00

logic_and_programming

spelling error in logic_and_programming readme

2016-02-09 11:35:32 -07:00

machine_learning

Updated Random forests paper location

2016-03-05 15:56:22 +01:00

macros

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

memory_management

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

networks

added paper on SPDY evaluation

2016-03-10 22:13:27 +00:00

non_blocking_algorithms

Cryptography, non-blocking algorithm, quantum computing (#409 )

2016-08-18 10:16:55 -04:00

operating_systems

Update README.md

2016-08-18 10:17:55 -04:00

organizational_simulation

next paper nyc

2016-04-22 00:37:24 +01:00

paradigms

Move paradigms into a paradigms directory (#407 )

2016-07-28 16:11:38 -04:00

pattern_matching

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

physics

add paper for presentation 11/19 nyc pwl

2015-10-20 16:49:38 -04:00

plt

Fixed typos

2016-04-14 17:04:59 +02:00

processes

Fix typo

2016-02-19 12:03:07 -06:00

program_verification

Add new category, program verification.

2014-05-02 13:52:23 -07:00

quantum_computing

Cryptography, non-blocking algorithm, quantum computing (#409 )

2016-08-18 10:16:55 -04:00

robotics

Update README.md

2016-02-25 22:01:00 +01:00

security

Add files via upload

2016-06-08 13:25:52 +02:00

speech_recognition

Added tutorial link back in with new URL as provided by @DarrenN

2015-01-26 15:27:34 +00:00

sports_analytics

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

stringology

add "Fast String Searching"

2015-03-06 07:18:18 -05:00

sublinear_algorithms

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

testing

Move languages into languages dir. Move 'tdd' dir into testing (#403 )

2016-07-10 23:04:17 -04:00

time_series

Update to all READMEs for hosted content

2015-10-07 15:12:22 -04:00

user_interfaces

init #clojurewest papers to research

2014-03-26 10:54:45 -07:00

virtual_machines

fastcommit

2015-07-07 00:04:27 +02:00

.gitignore

primecoin

2013-11-30 12:15:01 -05:00

CODE_OF_CONDUCT.md

Add "age" to Spelling It Out (#408 )

2016-07-26 15:46:05 -04:00

README.md

README.md: Add Washington, DC chapter (#411 )

2016-08-25 14:58:36 -04:00

README.md

Papers We Love (PWL) is a community built around reading, discussing and learning more about academic computer science papers. This repository serves as a directory of some of the best papers the community can find, bringing together documents scattered across the web. You can also visit the Papers We Love site for more info.

Due to licenses we cannot always host the papers themselves (when we do, you will see a 📜 emoji next to its title in the directory README) but we can provide links to their locations.

If you enjoy the papers, perhaps stop by a local chapter meetup and join in on the vibrant discussions around them. You can also discuss PWL events, the content in this repository, and/or anything related to PWL on our Slack, after signing-up to join it, or on our #paperswelove IRC channel on freenode.

Chapters

Here are our official chapters. Let us know if you are interested in starting one in your city!

All of our meetups follow our Code of Conduct.

Past Presentations

Check out our Youtube and MixCloud (audio-only format) channels.

Info

We're looking for pull requests related to papers we should add, better organization of the papers we do have, and/or links to other paper-repos we should point to.

Other Good Places to Find Papers

Please check out our wiki-page for links to blogs, books, exchanges that are worth a good read.

How To Read a Paper

Reading a paper is not the same as reading a blogpost or a novel. Here are a few handy resources to help you get started.

Applications/Ideas built around Papers We Love

Love a Paper - @lovepaper

Contributing Guidelines

Please take a look at our CONTRIBUTING.md file.

Copyright

The name "Papers We Love" and the logos for the organization are copyrighted, and under the ownership of Papers We Love Ltd, all rights reserved. When starting a chapter, please review our guidelines and ask us about using the logo.