Basic Machine Learning Models Summary & Implement

Posted by Z on November 14, 2019

SVM

SVM详细的原理已经在去年整理在:

  1. 【大数据算法课程笔记】Lesson 6/7-SupportVectorMachine Theorem
  2. 【大数据算法课程笔记】Lesson 8 - Optimal Condition & Dual SVM
  3. 【大数据算法课程笔记】Lesson 9 - SVM & Algorithm (ADMM/ALM)
  4. 【大数据算法课程笔记】Lesson 10- SVM:Proximal Gradient Method

Code:
SVM - Python Code & Examples

KNN

算法流程:  

  1. 计算测试数据与各个训练数据之间的距离; 
  2. 按照距离的递增关系进行排序;  
  3. 选取距离最小的K个点;  
  4. 确定前K个点所在类别的出现频率;  
  5. 返回前K个点中出现频率最高的类别作为测试数据的预测分类  

Reference: https://www.cnblogs.com/jyroy/p/9427977.html

Code:
Define kNN - Python Code & Examples

Pros:

  1. 简单粗暴,无需训练
  2. 适合对稀有事件进行分类
  3. 特别适合于多分类问题,表现比SVM要好

Cons:

  1. 对待样本不平衡的数据集效果较差
  2. 计算量大
  3. 可理解性差

Naive Bayes

Code:

连续特征 - Bayes Classifier and Boosting - Python Code & Examples

离散特征 - Define Naive Bayes - Python Code & Examples

Decision Tree & Forest

Code:

Decison Tree - Bayes Classifier and Boosting - Python Code & Examples

Forest - Define Naive Bayes - Python Code & Examples