Some basic questions about Machine Learning

cherryman · Jan 12, 2014

Hello all, I have several basic questions about Machine Learning to better solidify my knowledge.
I hope to find the answers here.

1. SVM
I have a one dimensional feature vector (x) of length N and the target vector that haves labels of two classes (e.g. 0 and 1). The case is not linearly separable, so the histograms of those features given classes are overlapped.
Can You explain me, why the SVM generates different hypothesis according to the different kernel functions?
Is it related with that, for the SVM it is actually not a linear case? And when I use a kernel function I actually make a transformation z = phi(x)?

2. Kernels
Can You explain me the basic concept of a kernel? What it actually do to the data set? I know that it is an inner product of the given data (x'x) but how it affect on the data set?

3 SVM and classes with uneven number of representants
I heard that SVM is sensitive to uneven classes (classes with uneven number of representants). Can You tell me is it true? And if it is, can You tell me why?
On the other hand to evaluate hyperplane the SVM needs only support vectors, and other data are not necessary except those, which are misclassified
in soft margin SVM.

4. Data normalization
Having a set of M real signals, I extracted a set of 2D - features from them. From one signal there are N 2D features, so as a result I have the M matrices with Nx2 dimension. The task is to recognise 2 classes (0 and 1) from them. But the one class occurs much frequent than the other.
The main problem that I have is as follows.
Within each signal (matrix) the features have quite good separation, but when I want to create a training vector from all of the signals, the features from different classes are highly overlap. So I suspect, the problem is in normalization. But how to make good normalization? I have found that there is a good practice to subtract mean value from the features and then divide them by max(abs(x)) or standard deviation.
But if I do that on the vector that have features form two classes, the result will be different comparing to the vector with only one class representation.
The second hypothesis is that I have extracted weak features.

Probably I'll have more questions, but right now that is all. If somebody can help me I'll be very thankful.

ReyManz · Feb 4, 2014

Well I see nobody has answered yet.. so I think I can try to answer at least the 2 first questions:

The basic SVM works only if your data is linearly separable... if it isn't, then you'll have to try using different Kernel Functions. These Kernel functions transform/map your data in a so called "feature space". This feature space is usually higher dimensional than your data space (in your case 2D).
The idea of the Kernel trick is that the data in the feature space become linearly separable (by a hyperplane). So, if the feature space is 3D, we try to find the 2D plane that separates your data. Note that a hyperplane always has the dimension n-1, where n is the feature space dimension.

When this plane is found, we map the plane back into your original data space (so your data set is actually unaffected in the end). The plane will become a curve in your data space that separates your data properly in the original space.

An important aspect of the Kernel Trick, is that you don't need to compute the coordinates of the feature space explicitely, because you only need the inner product for the whole optimization behind the SVM feature vector calculation. That's why it is possible to have feature spaces with an infinite dimension (RBF-Functions).

Hope I helped at least a bit.

Greetings

cherryman · Feb 4, 2014

Thanks a lot for the answer.

ReyManz said:
An important aspect of the Kernel Trick, is that you don't need to compute the coordinates of the feature space explicitely, because you only need the inner product for the whole optimization behind the SVM feature vector calculation. That's why it is possible to have feature spaces with an infinite dimension (RBF-Functions).

I think that is the key of the kernel method, because the transformation/mapping that You mention above can be done by a simple basis functions (e.g polynomial, sigmoidal etc.)
I just didn't get the difference between those two methods.
Thanks a lot once again. Indeed it was very helpful.

Welcome to EDAboard.com

Some basic questions about Machine Learning

cherryman

Member level 1

ReyManz

Newbie level 2

cherryman

cherryman

Member level 1

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics