引用
Latex
@ARTICLE{7243331,
author={Y. Zhang and D. w. Gong and J. Cheng},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
title={Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification},
year={2017},
volume={14},
number={1},
pages={64-75},
keywords={bioinformatics;decision making;encoding;feature selection;particle swarm optimisation;probability;signal processing;PSO-based multiobjective feature selection algorithm;Pareto domination relationship;Pareto front;bioinformatics;classification performance;classification problems;cost-based feature selection problems;crowding distance;data-preprocessing technique;decision-makers;effective hybrid operator;external archive;feature subsets;multiobjective feature selection algorithms;multiobjective particle swarm optimization;nondominated solutions;probability-based encoding technology;signal processing;single-objective optimization problem;Bioinformatics;Classification algorithms;Genetic algorithms;IEEE transactions;Optimization;Particle swarm optimization;Search problems;Feature selection;cost;multi-objective;particle swarm optimization},
doi={10.1109/TCBB.2015.2476796},
ISSN={1545-5963},
month={Jan},}
Normal
Y. Zhang, D. w. Gong and J. Cheng, “Multi-Objective Particle Swarm Optimization Approach for Cost-Based Feature Selection in Classification,” in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 14, no. 1, pp. 64-75, Jan.-Feb. 1 2017.
doi: 10.1109/TCBB.2015.2476796
keywords: {bioinformatics;decision making;encoding;feature selection;particle swarm optimisation;probability;signal processing;PSO-based multiobjective feature selection algorithm;Pareto domination relationship;Pareto front;bioinformatics;classification performance;classification problems;cost-based feature selection problems;crowding distance;data-preprocessing technique;decision-makers;effective hybrid operator;external archive;feature subsets;multiobjective feature selection algorithms;multiobjective particle swarm optimization;nondominated solutions;probability-based encoding technology;signal processing;single-objective optimization problem;Bioinformatics;Classification algorithms;Genetic algorithms;IEEE transactions;Optimization;Particle swarm optimization;Search problems;Feature selection;cost;multi-objective;particle swarm optimization},
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7243331&isnumber=7842713
摘要
Feature selection is an important data-preprocessing technique in classification problems — bioinformatics and signal processing
- maximizing the classification performance
- minimizing the cost that may be associated with features
cost-based feature selection
multi-objective particle swarm optimization (PSO)
- a probability-based encoding technology
- an effective hybrid operator
- the ideas of the crowding distance, the external archive, and the Pareto domination relationship
compared with:several multi-objective feature selection algorithms
5 benchmark datasets
主要内容
PSO
FS
- filter
- wrapper
多目标
- the number of features
- the classification performance
support vector machine classifier
chaotic mappings:
- logistic
- tent
算法
A 编码粒子
解码:
B 适应度评估
成本:
总成本:
the classification error rate:
the leave-oneout cross-validation (LOOCV) of k-NN
the one nearest neighbor (1-NN) method:
In this method, a datum from the original dataset is selected as a testing sample, and the rest constitute the training samples. Then the 1-NN classifier predicts the class of the testing sample by calculating and sorting the distances between the testing sample and the training ones.
— repeated for each datum
C 外部存档更新
the crowding distance
the Pareto dominated comparison
D 更新Gbest and Pbest
a domination-based strategy — Pbest
archive — Gbest :
the diversity of non-dominated solutions — the crowding distance
the binary tournament — crowding distances
E 混合变异
trapped in local optima
- the re-initialization operator — reinitialize the fly velocities in each generation (10%)
- the jumping mutation — uniformly jump in any dimensional space with the probability of (a partial re-initialization)
the two operators does not add much computational burden
F 算法框架
acceleration coefficients 加速度系数:
G 收敛性分析
H 复杂度分析
Space Complexity
the archive memorizer —
memorizer for the particles —
total —
Computational Complexity — main time complexity
the Pareto comparison —
basic operation
the crowding distance metric —
the Pbest update —
the Gbest update —
the worst case time complexity —
试验
A 数据集
B 比较的算法及参数
- DE-based multi-objective feature selection algorithm (DEMOFS)
- the NSGA-based feature selection algorithm (NSGAFS)
- the SPEA2-based feature selection algorithm (SPEAFS)
- NSGAFS — based on the idea of NSGA-II
- 本文的HMPSOFS
All the algorithms are wrapper approaches
K-nearest neighbor (KNN)
the jumping probability is set to 0.01
C 性能度量
- the hyper-volume (HV) metric
- the two-set coverage (SC) — the degree of convergence of two Pareto optimal sets
- the SP metric — To estimate the distribution of solutions throughout the Pareto optimal set
D 混合变异分析
- HMPSOFS/J — HMPSOFS without the jumping mutation
- HMPSOFS/JR — HMPSOFS without the two operators
the re-initialization proportion
E