In logistic regression, we just assume the probability of x to be classified as 1 is :

**P( y = 1 | x ) = 1 / ( 1 + exp ( -w^T x) ) = hw(x)**

w is the parameter vector that we need to learn and optimize from the training sets.

This is the key to understand the logistic regression/classification, that function is the our model/hypothesis/assumption of the world.

**How and what is the guideline to find good w ?**

well, we need to maximize the probability of the training set/data regarding that w

**P( hw ) = P( y=1 |x1) * P( y=1| x2 ) …. P(y=1| xn )**

rewrite it as:

P ( y=1|x1) P(y=0|x1) … …. P( y=1|xn) P(y=0|xn)

Apply log to make it a sum and negative:

**J (hw) = – SUM( y_i log( hw(x_i)) + ( 1 – y_i) log( 1 – hw(x_i)) )**

That is called the cost function, we want to minimize the cost regarding to the choice of hw ( we can think hw as w0, w1 … etc).

For simplicity, let’s assume we have just w0, and w1 , thus J(hw) becomes J(w0,w1).

So we need to find a point on ( w0,w1) planet which minimize J(w0,w1).

Using the gradient descent ( on (w0,w1) planet), we can easily find that point!

**Use cases**:

Use simple logistic regression when you have one nominal variable with two values (male/female, dead/alive, etc.) and one measurement variable. The nominal variable is the dependent variable, and the measurement variable is the independent variable.

Use multiple logistic regression when you have one nominal variable and two or more measurement variables, and you want to know how the measurement variables affect the nominal variable. You can use it to predict probabilities of the dependent nominal variable, or if you’re careful, you can use it for suggestions about which independent variables have a major effect on the dependent variable.

For example, you might want to know the effect that blood pressure, age, and weight have on the probability that a person will have a heart attack in the next year.

**References**: