The normal use case is to discover interesting relations between variables in large databases, e.g: i(t) 1 ABDE 2 BCE 3 ABDE 4 ABCE 5 ABCDE 6 BCD the above are some transactions, let’s A, B, C … are some products people bought. we want to find which set are frequent set .e.g: BC is • Read More »

K-means clustering is unsupervised machine learning algorithm. Wikipedia has a great demo as below on how it works: Demonstration of the standard algorithm 1. k initial “means” (in this case k=3) are randomly generated within the data domain (shown in color). 2. k clusters are created by associating every observation with the nearest mean. The • Read More »

(1) Maximum the margin SVM is very easy to understand on the graph,, we just need to find the a separate plane which maximum the margin. see the graph below: (2) How to calculate/find the max Margin Assuming hard-margin issue for the simplicity of math, the separate plane can be expressed as: w*x -b = 0 where • Read More »

The big picture is: a quadratic programming problem can be reduced to be a linear programming problem. Here is how: (1) KTT conditions For any non-linear programming: max: f(x), s.t: g(x) <=0 It has been proved that it needs to meet Karush–Kuhn–Tucker (KKT) conditions provided that some regularity conditions are satisfied how it is being proved? it is • Read More »

Why study the linear programming (LP) ? LP has a lot of use cases, one of them is the SVM ( support vector machine). The SVM ‘s Lagrangian dual can give the lower bound of SVM, this Lagrangian dual can be solved by quadratic programming. The KKT conditions of this quadratic programming can be solved by • Read More »

In logistic regression, we just assume the probability of x to be classified as 1 is : P( y = 1 | x ) = 1 / ( 1 + exp ( -w^T x) ) = hw(x) w is the parameter vector that we need to learn and optimize from the training sets. This is • Read More »

Bayes theorem: where A and B are events and P(B) ≠ 0. P(A) and P(B) are the probabilities of observing A and B without regard to each other. P(A | B), a conditional probability, is the probability of observing event A given that B is true. P(B | A) is the probability of observing event B given that A • Read More »

Decision tree works just like computer language if. In AI/ML world, the problem is usually like this: Given training set with features [( f1,f2 ….), ….] and known category/label [c1, ….], how can we learn from this training set/data and design a decision tree , so that for any new data, we can predict which • Read More »

Here is how it works in plain English: we have training set ( known features ( normalized), and classification) : many data points: [ ( feature1,feature2,feature 3,…), ( f1,f2,f3 …), ….] and corresponding labels/classification: [category1, 2, …] for any new data point t calculate the distance between this t to each of the training set • Read More »