Information Theory


This assignment is designed to:

  • help you become more fluent with the terminology and the techniques of the course
  • give you an opportunity to work with some real text data in the context of probability and information theory
  • help you start working with probability distributions in code
  • help you get started conceiving design patterns for implementing probabilistic models


Show your work. Be clear and concise. This assignment should be typed.

Information Theory

[20 points]

(Adapted from: Manning & Schuetze, exercise 2.13)

Prove the following equality: $H_c(X,q) = E_p (log \frac{1}{q(x)})$ To help, note the following: $H_c(X,q) = H(X) + KL(p||q)$ and $-\sum_x p(x)log(q(x)) = E_p (log \frac{1}{q(x)})$

Be sure to apply good proof technique: justify each step in your proofs; set up your proofs in two-column format, with each step showing a statement on the left and a justification on the right.

Programming Exercise

The requirement for the programming part of the assignment is to learn distributions over characters for three texts of your choice and to compute important information-theoretic functions on those distributions:

  • Choose three digital text documents, each at least 1000 words long, such that two are from a similar source (e.g., chapters of the same book, books by the same author, talks from the same venue, etc.) and the third is from a dissimilar source (e.g., by a different author, in a different language, etc.).
  • [20 points] Distributions as bar charts: Produce bar charts representing the distributions over characters for each of your texts.
  • [30 points] Entropy: Report the entropy of the distribution of characters for each of your texts. Use base 2 logarithms for the Entropy calculation.
    • Remember that $ 0 \cdot log(0)$ is always defined to be 0.
  • [30 points] K-L Divergence: Report the K-L divergence of the character distribution for each pair of your chosen texts. Be sure to compute the K-L divergence in each direction (text 1 vs. text 2 and vice versa) for each pair of texts.
    • Use base 2 logarithms for the K-L Divergence calculation.
    • When implementing the K-L divergence of two distributions (call them p and q), you will want to avoid dividing by zero probabilities in q. To avoid such problems, you should “smooth” q by iterating over the support of p to notice any zeros in q. Be sure to copy q, replace any zero probability with a small quantity epsilon (use 1.0e-10), and then re-normalize the distribution after smoothing (to maintain the sum-to-1 property). Work with a copy of q; otherwise, you will permanently affect the distribution each time you compare models, which is undesirable.

Coding Considerations

  • Select a language and environment of your choice. Be able to briefly justify your choice
  • Plan to stick with that language and environment for the duration of the course.
  • Build classes that you can reuse in future projects


Your typed report should include:

  • Your work and solutions for the exercises
  • URLs for the documents you selected
  • The data itself. If too large (longer than a page), include just a brief description instead; e.g., “The Project Gutenberg copy of David Copperfield”.
  • The per-document character distributions as bar charts over characters. Use at most one page per bar chart.
  • The computed entropy and divergence values
  • Your code (just the classes you implemented)

Organize your report and use clear headings and explanations where needed.


Submit a .pdf document through Learning Suite.

cs-401r/assignment-4.txt · Last modified: 2014/09/26 23:44 by ringger
Back to top
CC Attribution-Share Alike 4.0 International = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0