Teaching Garbage Collection without Implementing Compilers or Interpreters was pointing to a broken link. I am making a change to put to a link still active with the same paper.
The following major changes were made:
- Create the `./data_science` directory
- Add the *Tidy Data* pdf
- Create/update the `./data_science` README with the article
information (including the scroll icon, link to the source, author,
and publication year)
Decisions:
- Since no relevant folder existed, I created the `./data_science`
directory. This is a broad subject, but until the number of articles
get to be unmanageable, I think that keeping them together will help
people find what they're interested in.
- The README does have a sub-category list (*Tidy Data* is under "Data
Cleaning"), but there is no corresponding subdirectory. This is
because there are few enough raw articles that someone browsing the
directory listing won't benefit from the subfolder (but it will cost
them an extra click), but someone skimming the README might want to
know more about the article categorization.
- The README listing includes scroll/title/author/link to source, but
it does not include any abstract/rationale. The different READMEs
take different approaches here, but this seems to be the best
trade-off between a concise listing and providing useful
information. I'm happy to add a rationale or summary if it would be
useful though.
Paper Rationale:
This paper describes a subset of data cleaning that had previously
been largely neglected: data tidying, or the process of reforming data
into a standardized structure to allow for easier manipulation and the
development of better data tools.
The author is prominent in the data science community, being the chief
scientist at RStudio, having authored a number of highly-regarded and
very popular data science packages (ex. `ggplot2` and `reshape2`).
He was named a Fellow by the American Statistical Association in 2015
for "pivotal contributions to statistical practice through innovative
and pioneering research in statistical graphics and computing." For
more on Hadley Wickham, see his website: http://hadley.nz/
This is a fairly popular paper as well; according to jstatsoft, it has
nearly 50k views. I've seen it mentioned in several other popular
media as well, including John Hopkin's very popular online Data
Science MOOC.
The main reason that I'm adding this paper, however, is because of
how well-written it is. I don't come from a data science background,
but after reading this paper, I walked away with a decent
understanding of the significance of Wickham's research and
standardization efforts, the current (circa 2014) state of the field,
and many of the technical details associated with his method of data
tidying. It was easy to read, despite my lacking data science
background, but it's clear that Wickham did not "dumb down" the
content in order to accomplish that.
I believe that other chapters and independent readers will find this
to be an interesting, enjoyable paper, and I believe that it will
continue to affect the field of data cleaning.
*This paper will be presented at the October meeting of Papers We Love
Washington, D.C. & Northern VA.*
Copyright Information:
The raw paper can be legally included in this repository. *Tidy Data*
falls under the [Creative Commons Attribution 3.0 Unported License],
which allows for sharing and adaptation with attribution.
[Creative Commons Attribution 3.0 Unported License]:
https://creativecommons.org/licenses/by/3.0/
- Classifies memory attacks into a hierarchy that is usable by both black- and white-hats.
- An excellent primer on the different memory-related vulnerabilities that exist, (more importantly) why they exist, and the ways in which various defences act to counter them.
Update README.md
Include year in README
Update README.md