MIT600SC笔记18&19



Notice: Undefined variable: class_attr in /var/www/html/cuijunwei/wp-content/plugins/jquery-image-lazy-loading/jq_img_lazy_load.php on line 57

Notice: Undefined variable: class_attr in /var/www/html/cuijunwei/wp-content/plugins/jquery-image-lazy-loading/jq_img_lazy_load.php on line 57

Lecture 18: Optimization Problems and Algorithms

决定系数 R**2 = 1 – EE/mv

R=1 表示我们构建的模型解释了所有数据的变异性,完美的解释了数据变化

R=0 表示模型预测值和实际数据之间没有任何线性关系,模型没用

1) start with an experiment

2) used computation to both find and evaluate a model

3) use some theory & analysis & computation to derive a consequence of model

Optimization problems 优化

1) an objective function

2) a set of constraints

problem reduction 将新问题映射到旧问题上,使用一些经典的解决方案

Knapsack问题

Greedy algorithm –at each step choose locally optimal solution

(0/1 knapsack problem 要么全拿,要么不拿,相对于continuous knapsack problem)

该算法效率O(len(Items)*log(len(Items))—对比while loop的效率 O(len(Items))

局部最优解的综合并不一定是全局最优解

1) item =< value, weight>

2) w as max.weight

3) I vector of available items

4)v vector v[i] =1 =>I[i] has been taken

 

Lecture 19: More Optimization and Clustering

蛮力算法

贪心算法–局部最优 Locally Optimal

Inherently exponential. 无论我们做什么,我们不能找到一种比指数更快的能保证找到最优解的办法。(事实上有算法能够足够快的解决该问题)

机器学习—>归纳推理

『有监督学习』Label with each example in a training set

如果是离散的,则称作分类问题

如果labels are really valued, 则称作回归问题。

1 are the labels accurate?

2 Is past representative future.

3 Do you have enough data to generalize?

feature extraction

how tight should the fit be?

Minimize training error

『无监督学习』

任务是找到Regularities of the data

Clustering—集群,如何分组

1 low intra-cluster dissimilarity

2 high inter-………

集群内差异计算 badness
600sc19

 

为该分类添加约束:集群的最大数量

实践中人们最常用到的两种贪心算法:

k-means :需要K个集群/找到最好的K个集群

Hierarchical:

N个物体,N*N矩阵来说明两个元素之间的距离,例如距离表

600sc1902

 

Agglomerative

1) start by assigning each item to its own cluster.

2)  to find the most similar pair of clusters and merge them

3)continue the process

Linkage criterion:1) single linkage-两集群之间距离是两集群之间距离最近的两个个体之间的距离;2)complete linkage–两集群之间的距离是两集群中最远的两个点的距离;3)Average 两集群中所有点的平均值之间的距离

Feature Selection

Feature vector



发表评论

电子邮件地址不会被公开。 必填项已用*标注