Flink1.11.2 flink-ml-lib (原1.8FlinkML) 目录结构笔记

经过进一步学习,发现flink-ml-lib这个包是基于flink-ml-api这个包做的一个高层封装。
下面,就对这个包的内容进行深入的分析。

首先这个包里分为四个模块,分别对应本文的四个一级标题。
在这里插入图片描述

common

在这里插入图片描述

  • linalg 主要与线性代数有关
  • mapper 一些映射,现在还不知道干什么的
  • model 一些model source,作用未知
  • statistics 统计学算法,里边只有一个多元高斯分布
  • utils 常用工具类
  • MLEnvironment 作用未知
  • MLEnvironmentFactory 作用未知

linalg

在这里插入图片描述

  • BLAS
    A utility class that provides BLAS routines over matrices and vectors.
    BLAS(Basic Linear Algebra Subprograms)即基础线性代数子程序库,里面拥有大量已经编写好的关于线性代数运算的程序。

  • DenseMatrix
    DenseMatrix stores dense matrix data and provides some methods to operate on the matrix it represents.

  • DenseVector
    A dense vector represented by a values array.

  • MatVecOp 工具类
    A utility class that provides operations over {@link DenseVector}, {@link SparseVector} and {@link DenseMatrix}.

  • SparseVector
    A sparse vector represented by an indices array and a values array.

  • Vector 工具类,包含关于DenseVector和SparseVector的常用方法
    The Vector class defines some common methods for both DenseVector and SparseVector.

  • VectorIterator 遍历Vector使用的工具类
    An iterator over the elements of a vector.

  • VectorUtil Vector和它的子类的工具类
    Utility class for the operations on {@link Vector} and its subclasses.

mapper

在这里插入图片描述

  • Mapper
    Abstract class for mappers. A mapper takes one row as input and transform it into another row.
  • MapperAdapter
    A class that helps adapt a {@link Mapper} to a {@link MapFunction} so that the mapper can run in Flink.
  • ModelMapper
    An abstract class for {@link Mapper Mappers} with a model.
  • ModelMapperAdapter
    A class that adapts a {@link ModelMapper} to a Flink {@link RichMapFunction} so the model can be loaded in a Flink job.
    This adapter class hold the target {@link ModelMapper} and it’s {@link ModelSource}.
    Upon open(), it will load model rows from {@link ModelSource} into {@link ModelMapper}.

model

在这里插入图片描述

  • BroadcastVariableModelSource
    A {@link ModelSource} implementation that reads the model from the broadcast variable.

  • ModelSource
    An interface that load the model from different sources. E.g. broadcast variables, list of rows, etc.

  • RowsModelSource
    A {@link ModelSource} implementation that reads the model from the memory.

statistics - basicstatistic

在这里插入图片描述

  • MultivariateGaussian 多元高斯分布
    This class provides basic functionality for a Multivariate Gaussian (Normal) Distribution.

utils

在这里插入图片描述

  • DataSetConversionUtil
    Provide functions of conversions between DataSet and Table.
  • DataStreamConversionUtil
    Provide functions of conversions between DataStream and Table.
  • OutputColsHelper
/**
 * Utils for merging input data with output data.
 *
 * <p>Input:
 * 1) Schema of input data being predicted or transformed.
 * 2) Output column names of the prediction/transformation operator.
 * 3) Output column types of the prediction/transformation operator.
 * 4) Reserved column names, which is a subset of input data's column names that we want to preserve.
 *
 * <p>Output:
 * 1)The result data schema. The result data is a combination of the preserved columns and the operator's
 * output columns.
 *
 * <p>Several rules are followed:
 * <ul>
 * <li>If reserved columns are not given, then all columns of input data is reserved.
 * <li>The reserved columns are arranged ahead of the operator's output columns in the final output.
 * <li>If some of the reserved column names overlap with those of operator's output columns, then the operator's
 * output columns override the conflicting reserved columns.
 * <li>The reserved columns in the result table preserve their orders as in the input table.
 * </ul>
 *
 * <p>For example, if we have input data schema of ["id":INT, "f1":FLOAT, "f2":DOUBLE], and the operator outputs
 * a column "label" with type STRING, and we want to preserve the column "id", then we get the result
 * schema of ["id":INT, "label":STRING].
 *
 * <p>end user should not directly interact with this helper class. instead it will be indirectly used via concrete algorithms.
 */
  • TableUtil
    Utility to operator to interact with Table contents, such as rows and columns.

  • VectorTypes 内置Vector类型
    Built-in vector types.

MLEnvironment

The MLEnvironment stores the necessary context in Flink.
Each MLEnvironment will be associated with a unique ID.
The operations associated with the same MLEnvironment ID will share the same Flink job context.

MLEnvironmentFactory

Factory to get the MLEnvironment using a MLEnvironmentId.

operator

在这里插入图片描述

  • TableSourceBatchOp
    Transform the Table to SourceBatchOp.
  • BatchOperator
    Base class of batch algorithm operators.
  • TableSourcesStreamOp
    Transform the Table to SourceStreamOp.
  • StreamOperator
    Base class of stream algorithm operators.
  • AlgoOperatior
    Base class for algorithm operators.

params

在这里插入图片描述

  • HasOutputCol
  • An interface for classes with a parameter specifying the name of the output column.
  • HasOutputColDefaultAsNull
  • An interface for classes with a parameter specifying name of the output column with a null default value.
  • HasOutputCols
  • An interface for classes with a parameter specifying names of multiple output columns.
  • HasOutputColsDefaultAsNull
  • An interface for classes with a parameter specifying names of multiple output columns. The default parameter value is null.
  • HasPredictionCol
  • An interface for classes with a parameter specifying the column name of the prediction.
  • HasPredictionDetailCol
  • An interface for classes with a parameter specifying the column name of prediction detail.
  • HasReservedCols
  • An interface for classes with a parameter specifying the names of the columns to be retained in the output table.
  • HasSelectedCol
  • An interface for classes with a parameter specifying the name of the table column.
  • HasSelectedColDefaultAsNull
  • An interface for classes with a parameter specifying the name of the table column with null default value.
  • HasSelectedCols
  • An interface for classes with a parameter specifying the name of multiple table columns.
  • HasSelectedColsDefaultAsNull
  • An interface for classes with a parameter specifying the name of multiple table columns with null default value.
  • HasMLEnvironmentId
  • An interface for classes with a parameter specifying the id of MLEnvironment.

pipeline

在这里插入图片描述

  • EstimatorBase
    The base class for estimator implementations.

  • ModelBase
    The base class for a machine learning model.

  • PipelineStageBase
    The base class for a stage in a pipeline, either an [[EstimatorBase]] or a [[TransformerBase]].

  • TransformerBase
    The base class for transformer implementations.

下一篇具体分析源码及用法。

猜你喜欢

转载自blog.csdn.net/weixin_42072754/article/details/114063012