From 4961339f7c60fd05745e170d5cb377a85443a8c3 Mon Sep 17 00:00:00 2001 From: Shriphani Palakodety Date: Tue, 18 Mar 2014 19:23:23 -0400 Subject: [PATCH] added IR readme with document and explanation --- information_retrieval/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 information_retrieval/README.md diff --git a/information_retrieval/README.md b/information_retrieval/README.md new file mode 100644 index 0000000..a6618f8 --- /dev/null +++ b/information_retrieval/README.md @@ -0,0 +1,19 @@ +## Information Retrieval + +Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. (Says Wikipedia). + +The included documents are + +* + [Graph of Word and TW-IDF](http://www.lix.polytechnique.fr/~rousseau/papers/rousseau-cikm2013.pdf) - Francois Rousseau & Michalis Vazirgiannis + + The traditional IR system stores term-specific statistics (typically + a term's frequency in each document - which we call TF) in an index. + Such a model ignores dependencies between terms and considers a + document's terms to occur independently of each other (and is aptly + called the bag-of-words model). In this paper the authors use a + statistic that uses a graph representation of a document to encode + dependencies between terms and replace the TF statistic with a new + TW statistic based on the graph constructed and achieve + significantly better results that popular existing models. This + paper won a honorable mention at CIKM 2013.