Changes:
- add: *Combinatorial Analysis and Computers* (1965) to
`combinatory_logic/README.md` list
- add: year to the other paper in the README
- fix: tweak format of papers README for readability
Decisions:
- I put this in the `combinatory_logic` folder, but I think it would
also fit in the `comp_sci_fundamentals_and_history` folder (given
Knuth's historical importance to the field and the more
theoretical nature of the paper). This seemed more direct.
---
Paper Title: Combinatorial Analysis and Computers
Author(s): Marshall Hall Jr. and Donald E. Knuth
Paper Year: 1965
Reasons for Including Paper:
Papers We Love DC/NoVA will be discussing this paper (and others) at our
November meetup.
This paper is included in Donald Knuth's book *Selected Papers
on Discrete Mathematics*. Knuth's writings have been extremely
important to the field of computer science, and I think that most of
his papers would fit in well here. This one introduces computational
complexity and the benefits/limits of computing, then it dives into
several combinatorial problems.
I find it interesting because 1) it's a neat view of the possibilities and
limitations of computation early on, and 2) the problems that he lays out
are interesting exercises even today.
Teaching Garbage Collection without Implementing Compilers or Interpreters was pointing to a broken link. I am making a change to put to a link still active with the same paper.
The following major changes were made:
- Create the `./data_science` directory
- Add the *Tidy Data* pdf
- Create/update the `./data_science` README with the article
information (including the scroll icon, link to the source, author,
and publication year)
Decisions:
- Since no relevant folder existed, I created the `./data_science`
directory. This is a broad subject, but until the number of articles
get to be unmanageable, I think that keeping them together will help
people find what they're interested in.
- The README does have a sub-category list (*Tidy Data* is under "Data
Cleaning"), but there is no corresponding subdirectory. This is
because there are few enough raw articles that someone browsing the
directory listing won't benefit from the subfolder (but it will cost
them an extra click), but someone skimming the README might want to
know more about the article categorization.
- The README listing includes scroll/title/author/link to source, but
it does not include any abstract/rationale. The different READMEs
take different approaches here, but this seems to be the best
trade-off between a concise listing and providing useful
information. I'm happy to add a rationale or summary if it would be
useful though.
Paper Rationale:
This paper describes a subset of data cleaning that had previously
been largely neglected: data tidying, or the process of reforming data
into a standardized structure to allow for easier manipulation and the
development of better data tools.
The author is prominent in the data science community, being the chief
scientist at RStudio, having authored a number of highly-regarded and
very popular data science packages (ex. `ggplot2` and `reshape2`).
He was named a Fellow by the American Statistical Association in 2015
for "pivotal contributions to statistical practice through innovative
and pioneering research in statistical graphics and computing." For
more on Hadley Wickham, see his website: http://hadley.nz/
This is a fairly popular paper as well; according to jstatsoft, it has
nearly 50k views. I've seen it mentioned in several other popular
media as well, including John Hopkin's very popular online Data
Science MOOC.
The main reason that I'm adding this paper, however, is because of
how well-written it is. I don't come from a data science background,
but after reading this paper, I walked away with a decent
understanding of the significance of Wickham's research and
standardization efforts, the current (circa 2014) state of the field,
and many of the technical details associated with his method of data
tidying. It was easy to read, despite my lacking data science
background, but it's clear that Wickham did not "dumb down" the
content in order to accomplish that.
I believe that other chapters and independent readers will find this
to be an interesting, enjoyable paper, and I believe that it will
continue to affect the field of data cleaning.
*This paper will be presented at the October meeting of Papers We Love
Washington, D.C. & Northern VA.*
Copyright Information:
The raw paper can be legally included in this repository. *Tidy Data*
falls under the [Creative Commons Attribution 3.0 Unported License],
which allows for sharing and adaptation with attribution.
[Creative Commons Attribution 3.0 Unported License]:
https://creativecommons.org/licenses/by/3.0/