Mark's Notes These are my notes from McCallumNigam_NaiveBayes-aaaiws98.pdf ://faculty.cs.byu.edu/~ringger/papers/McCallumNigam_NaiveBayes-aaaiws98.pdf.

### Multi-variate Bernoulli Event Model

• Document is represented by a vector of binary attributes indicating which words occur and do not occur in the document.
• Word count is lost.
• Word order is lost.
• Document probability is the product of the probabilities of all attribute values (including non-occurrence word probabilities).
• Appropriate for tasks with fixed-number of attrbiutes (fixed-length documents).
• More traditional in Bayesian networks field.

### Multinomial Event Model

• Document is represented by set of word occurrences from document.
• Word order is lost.
• Word count (per document) is captured.
• Probability of a document is product of probabilities of words that occur.
• Individual word occurrences are the 'events', document is collection of word events.
• More traditional in statistical language modeling for speech recognition (unigram language model).
• Has been used by numerous people for text classification (see paper for references).
• Out-performs multi-variate at large and optimal vocabulary sizes.
• Provides a 27% reduction in error over multi-variate Bernoulli model.

### Bayesian Learning Framework

• Parametric generative model.
• Bayes-optimal estimates of the model parameters calculated using training data.
• Document classification using Bayes' rule to “turn the generative model around and calculate the posterior probability” that the document would have been generated by a particular class.
• Select the most probable generative class for classification.
• Mixture model parameterized by theta.

Let's assume that there is a finite number of generative classes and that one could gain access to all documents from each class. In other words, assume that one could gain access to every document ever written and that these documents represent every generative class possible. Could one use a clustering technique on this corpus to discover the generative classes? Perhaps only a substantial amount of documents (much smaller than all of them) need to be representing each generative class.

Consider a randomly selected document from the corpus of all documents ever written. Apply this document to a Bernoulli test, where success denotes that the document is from a generative class of interest, specified as a parameter of the Bernoulli test. Is this feasible?