**The Exponential family of distributions can be written as:**

η: natural parameter (vector)

T(y): sufficient statistic (vector), usually T(y) = y

a(η): log partition function

e−a^{(η)}: normalization constant

By fixing a, b and T, p(y; η) defines a set of distributions which vary by η

As we vary η, we get different distributions within the set

Bernoulli (classification regression) and Gaussian (logistic regression) examples can be expressed within the exponential family

**Proof: Bernoulli Regression
**We need to find values of a,b and T such that as we vary η, we get different distributions of y and prove that every member of the Bernoulli set has a corresponding member in the Exponential family

Bernoulli distribution with a mean (φ), specifies a distribution over y ∈ {0, 1}

p(y = 1;φ) = φ; p(y = 0;φ) = 1−φ

The Bernoulli distribution:

We can map these components to a,b, T via

Now that we have values for a,b and T, we can find a value in one distribution and map to another

**Proof: Gaussian Regression
**We set σ2 = 1 since the value of σ2 had no effect on our final choice of θ and hθ(x)

We similarly can map a,b and T

**Poisson distribution**

Good for counts (visitors, hits) > lots of individual people and each has a minute chance of turning up in your count

**Generalized Learning Model**

A way to model y given some value of y and η

Where η is a linear

Where η is a vector

**Multinomial
**Example where there are multiple categories of data (ie. ebay listing)

Can we come up with a decision boundary that separates categories?