mirror of https://github.com/papers-we-love/papers-we-love.git synced 2026-03-02 03:40:31 +00:00

Files

Lee Sharma 3565b2ff01 Add Tidy Data (2014) to list of articles (#414 )

The following major changes were made:

  - Create the `./data_science` directory

  - Add the *Tidy Data* pdf

  - Create/update the `./data_science` README with the article
    information (including the scroll icon, link to the source, author,
    and publication year)

Decisions:

  - Since no relevant folder existed, I created the `./data_science`
    directory. This is a broad subject, but until the number of articles
    get to be unmanageable, I think that keeping them together will help
    people find what they're interested in.

  - The README does have a sub-category list (*Tidy Data* is under "Data
    Cleaning"), but there is no corresponding subdirectory. This is
    because there are few enough raw articles that someone browsing the
    directory listing won't benefit from the subfolder (but it will cost
    them an extra click), but someone skimming the README might want to
    know more about the article categorization.

  - The README listing includes scroll/title/author/link to source, but
    it does not include any abstract/rationale. The different READMEs
    take different approaches here, but this seems to be the best
    trade-off between a concise listing and providing useful
    information. I'm happy to add a rationale or summary if it would be
    useful though.

Paper Rationale:

  This paper describes a subset of data cleaning that had previously
  been largely neglected: data tidying, or the process of reforming data
  into a standardized structure to allow for easier manipulation and the
  development of better data tools.

  The author is prominent in the data science community, being the chief
  scientist at RStudio, having authored a number of highly-regarded and
  very popular data science packages (ex. `ggplot2` and `reshape2`).
  He was named a Fellow by the American Statistical Association in 2015
  for "pivotal contributions to statistical practice through innovative
  and pioneering research in statistical graphics and computing." For
  more on Hadley Wickham, see his website: http://hadley.nz/

  This is a fairly popular paper as well; according to jstatsoft, it has
  nearly 50k views. I've seen it mentioned in several other popular
  media as well, including John Hopkin's very popular online Data
  Science MOOC.

  The main reason that I'm adding this paper, however, is because of
  how well-written it is. I don't come from a data science background,
  but after reading this paper, I walked away with a decent
  understanding of the significance of Wickham's research and
  standardization efforts, the current (circa 2014) state of the field,
  and many of the technical details associated with his method of data
  tidying. It was easy to read, despite my lacking data science
  background, but it's clear that Wickham did not "dumb down" the
  content in order to accomplish that.

  I believe that other chapters and independent readers will find this
  to be an interesting, enjoyable paper, and I believe that it will
  continue to affect the field of data cleaning.

  *This paper will be presented at the October meeting of Papers We Love
  Washington, D.C. & Northern VA.*

Copyright Information:

  The raw paper can be legally included in this repository. *Tidy Data*
  falls under the [Creative Commons Attribution 3.0 Unported License],
  which allows for sharing and adaptation with attribution.

  [Creative Commons Attribution 3.0 Unported License]:
    https://creativecommons.org/licenses/by/3.0/

2016-08-30 10:47:06 -04:00

README.md

Add Tidy Data (2014) to list of articles (#414 )

2016-08-30 10:47:06 -04:00

tidy_data.pdf

Add Tidy Data (2014) to list of articles (#414 )

2016-08-30 10:47:06 -04:00

README.md

Data Science

Data Cleaning

📜 Tidy Data by Hadley Wickham (2014)