人工智能常见术语速查手册

人工智能常见术语速查手册
=========

AI
Df is “degrees of freedom,” 
Sum Sq is “sumof squares,” 
Mean Sq is “mean
squares” (short for mean-squared deviations), and F value is the F-statistic. 

The preceding f(x) function is known as the probability density function (PDF).
The PDF is the continuous random variable version of the PMF for discrete random
variables.

Tf-idf is a simple twist on the bag-of-words approach. It stands for  term frequency–inverse document frequency. 
tf-idf(w, d) = bow(w, d) * log (N / # documents in which word w appears)

zero-phase component analysis (ZCA) (Bell and Sejnowski, 1996) is a whitening transformation that is closely related to PCA,but that does not reduce the number of features. ZCA whitening uses the full set of principal components V without reduction, and includes an extra multiplication back onto V (Equation 6-12)
logit:分对数
PAC:probably approximately correct 可能大致正确；可能近似正确
stochastic gradient descent (SGD):随机梯度下降
MAP: maximum a posteriori 最大后验
apriori probabilities:先验概率
posteriori probability：后验概率
cloud computing:云计算
Rectified Linear Unit (ReLU) Transformation 整流线性单元（ReLU）变换
convolutional layer 卷积层
image filter   图片滤镜
convolution 卷积
image gradient 图像梯度
Scale Invariant Feature Transform (SIFT) 尺度不变特征变换
Histogram of Oriented Gradients (HOG) 定向梯度直方图
part-of-speech tagging 词性标注
featurization 特制影片
stopwords 停用词
bag-of-words (BoW)  词袋
logarithm 对数
singular value decomposition (SVD)
latent semantic analysis (LSA) 潜在语义分析
eigenvalues 本征值
eigendigit 本征
stochastic learning 随机学习
iris: 鸢花，读:yuan 花 也叫鸢尾花，花形似翩翩起舞的蝴蝶。鸢尾花是鸢尾属植物，是对一族草本开花植物的统称。

gradient boosting tree (GBT) 
column-wise 逐列;一列接着一列
Linear Discriminant Analysis: 线性判别分析（LDA）是一种特征转换技术，也是一种特征转换技术监督分类器。
eigenvalue 特征值
eigenvector 特征向量
multicollinearity 多重共线性
Feature engineering(特征工程) is the process of formulating the most
appropriate features given the data, the model, and the task.

probability mass function (PMF) to describe a discrete random
variable


We usually use the notation of H0 to represent the null hypothesis and Ha to represent our alternative hypothesis

The null hypothesis is the statement being tested and is the default correct answer; it is
our starting point and our original hypothesis. The alternative hypothesis is the
statement that opposes the null hypothesis. Our test will tell us which hypothesis we
should trust and which we should reject.

This type of analysis and the creation of these types can fall under a specific type
of unsupervised learning called clustering. 这种类型的分析和这些类型的创建可以属于特定类型
无监督学习称为聚类。

Bagging is short for Bootstrap aggregation, which means the aggregation of Bootstrap samples.

Recurrent Neural Network(RNN) 循环神经网络，。神经网络是一种节点定向连接成环的人工神经网络。这种网络的内部状态可以展示动态时序行为。不同于前馈神经网络的是，RNN可以利用它内部的记忆来处理任意时序的输入序列，这让它可以更容易处理如不分段的手写识别、语音识别等。
词条由“科普中国”百科科学词条编写与应用工作项目 审核 。

零假设（null hypothesis），统计学术语，又称原假设,或虚无假设，指进行统计检验时预先建立的假设。 零假设成立时，有关统计量应服从已知的某种概率分布

null accuracy  零准确度
dummy encoding 虚拟编码
This execution plan is a directed acyclic graph (DAG) of transformations
deep neural networks (networks with many layers)  DNN 深度神经网络
back-propagation algorithm 反向传播算法
A multilayer perceptrons (MLP) is a finite acyclic graph. The nodes are neurons with logistic activation.
perceptron  感知器
scale-invariant 规模不变的
covariance 协方差
high variance 高方差
Ensembling techniques 集成
Principal Component Analysis (PCA)
The Silhouette Coefficient 剪影系数
clustering  聚类
The mean squared error （MSE）•均方误差
The root mean squared error （RMSE） •均方根误差
The mean absolute error （MAE） 平均绝对误差
metrics 指标
Regression metrics   回归指标
point estimate 点估计值
radial basis function (RBF)
an ad hoc practice  临时实践
overarching principles 总体原则
Classification and Regression Tree (also known as CART)
JavaScript Object Notation (JSON)
Bagging =bootstrap aggregating(引导聚合)
Classification and Regression Trees (CART)
ensemble learning 集成学习形式
multicollinearity 导致多重共线性
signal-to-noise ratio (SNR)
K-Nearest Neighbors (KNN) 
generalized additive model (GAM) 
linear discriminant analysis (LDA)
Generalized linear models (GLMs)
Standardized residuals 标准化残差
confounding variable 混淆变量
Multicollinearity 多重共线性
coefficient  系数
Dummy variables 虚拟变量
Bayesian Information Criteria (BIC)
读取评估打印循环（REPL）
stochastic gradient descent: (SGD) 随机梯度下降
自助法（Bootstrap Method，Bootstrapping或自助抽样法
exploratory data analysis (EDA)
Singular-Value Decomposition, or SVD 
AdaBoost, short for “Adaptive Boosting”

Loss Function / Cost Function / Error Function

离散型数值的叫分类Classification，连续型数值分类叫回归Regression。
Clustering: 聚类
Softmax回归 Softmax Regression
supervised learning: 有监督学习 
unsupervised learning: 无监督学习
deep learning: 深度学习 
logistic regression: 逻辑回归 
intercept term: 截距项 
binary classification：二元分类 
class label:类型标记 
hypothesis: 估值函数/估计值 
cost function: 代价函数 is also known as loss function(损失函数)
multi-class classification: 多元分类 
weight decay: 权重衰减 

Artificial neural network (ANN): 人工神经网络
feed-forward neural network (FNN): 前馈神经网络 feed-forward neural network (FNN) is a
special type of neural network wherein links/connections between neurons
do not form a cycle.

convolutional neural network (CNN) :  is specific kind of
deep learning architecture that uses the convolution operation to extract
relevant explanatory features for the input image.

Recurrent neural networks (RNNs) 递归神经网络:This set
of architectures enables us to provide contextual information for current
predictions and also have specific architecture that deals with long-term
dependencies in any input sequence.

Principal component analysis (PCA)
Activation Function：激活函数
multi-layer perceptron (MLP 多层感知) is a class of feedforward artificial neural network.
Logistic Regression cost function (log loss)

混淆矩阵相关:
True Positive (TP): 正确识别出本特征 (正确认出本特征) an instance in the test set that had a positive target feature value
and that was predicted to have a positive target feature value
True Negative (TN): 正确识别出不是本特征 an instance in the test set that had a negative target feature value and that was predicted to have a negative target feature value

False Positive (FP) 把不是本特征预测成是本特征(误认): an instance in the test set that had a negative target feature value but that was predicted to have a positive target feature value
False Negative (FN): 未能识别出本特征(认不出) an instance in the test set that had a positive target feature value but that was predicted to have a negative target feature value

durionis   TP=5    TN(真阴性)=0  FP(伪阳性)=0  FN(伪阴性)=2
               recall=TP/(TP+FN)=5/(5+2)=5/7=0.714
                precision=TP/(TP+FP)=5/(5+0)=1.0
人工智能常见术语速查手册

猜你喜欢