skyline point解析

Skyline query是多维度数据库中一种非常重要的point query，它最初由 Börzsönyi 等人于2001年提出。一个数据库中的数据对象（也就是空间中的点）可能有成千上万个，但是我们往往对其中一些更感兴趣，Skyline就是定义这里的“更感兴趣”的一种方式。如下图所示是美国纽约曼哈顿区的天际线（Skyline)，显然这里的建筑有很多很多，但放眼望去我们所能看见的（也就是构成天际线，skyline算法也称为天际线算法）建筑其实是相当有限的。这些建筑要么是离我们特别近，要么是特别高大；换言之，也只有那些无法被完全遮挡的建筑才能被看到。Skyline query正是从这个想法出发提出的一种point query。
这里写图片描述

1.dominate定义

对应于前面讨论的”被遮挡”这个概念，我们首先给出dominate的定义:

Definition of point dominate: 
    a point A dominates another point B if and only if the coordinate of A on any axis 
is not larger than the corresponding coordinate of B.

dominate的定义:
    点 A dominate 点 B，当且仅当，A在任何轴的坐标值都小于等于B对应的轴的坐标值，并且不能全部等于.

我们以海边酒店为例，我们打算在海边订到酒店，希望酒店离海边越近越好，并且希望价钱也越便宜越好，则在酒店这个对象上dominate的定义就是:

Definition: Hotel i dominates hotel f
    if both distance and price of hotel i are smaller than or equal to those of hotel f 
AND both distance and price of hotel i are NOT equal to those of hotel f.

定义： Hotel i dominate hotel f
    如果hotel i 到海边的距离和价钱都小于或等于hotel f到海边的距离和价钱，并且距离和价钱不全等
（部分等，不能全部等），那么就称hotel i dominate hotel f

以下图为例：其中点a dominate b和e; 点k dominate e和l; 点i dominate 除k和i外所有的点.
这里写图片描述

下面是dominate的另一些例子:
这里写图片描述

这里写图片描述

2.skyline point 定义

有了dominate的定义之后，我们再定义skyline point：
这里写图片描述
简而言之就是：skyline point 是点集中不会被其他点dominate的点

所以我们的目标就是：找出所有的不能被其他点dominite的点。

以下面的酒店的例子为例:
这里写图片描述
图中酒店有两个属性，一个是离海边的距离distance，另一个是价格price。我们将其绘画到坐标轴上:

现在我们要找出没有被其他点dominate的点，具体实现是:
首先维护一个酒店的最小堆，堆中元素大小定义为(distance + price)之和，为了便于表达，我们将distance替换为x,price替换为y，则是(x + y)之和。为了找到skyline point，我们需要利用以下性质:
1.堆中的第一个点（即最小点）一定是skyline point。证明如下：假设该最小点m不是skyline point，也就是点m会被其他点dominate，假设这个点是点v。由于点v dominate 点m，也就意味着v的(x, y)一定小于或等于点m的（x, y），即 $x_v <= x_m \;and \;y_v <= y_m，并且二者不全等$ 。也就意味着 $(x_v + y_v);\ <=\;(x_m, y_m)$ ，但是因为点m是堆中最小的点，也就是点m对应的 $(x+y)$ 是最小的，所以这与点m是最小点冲突，则不存在这样的点v，所以堆中的第一个点(最小点)一定是skyline point。

2.判断某个点v会不会被其他点dominate，我们只需要判断已经是skyline point的点能不能dominate点v，如果这些点无法dominate 点v，那么其他点更不用考虑。证明:首先，能够dominate 点v的点一定是先于v出最小堆的，因为它们拥有更小的x和y，也就会拥有更小的(x + y)。我们将先于点v出最小堆的点分为两大类，一类属于skyline point，假设为点集S;另一类不属于 skyline point，假设为点集U。点集U中的点，之所以不是skyline point，必定是被点集S中的点dominate，可能是直接dominate，也有可能是间接dominate，因为dominate具有传递性，即点a dominate 点b, 点b dominate 点c，那么点a 也必定dominate 点c，根据dominate的定义能很好理解。所以，如果点v能够被点集U中的点dominate，根据dominate的传递性，它也一定被点集S中的某个点dominate，所以如果点v不能被点集S中的点dominate, 则更不可能被点集U中的点dominate。综合起来，我们不用考虑点集U中的点，只需要考虑点集S中的点，得证。

3.代码实现

以上面的酒店例子为例，可以看出，skyline point有三个，点i，点a，和点k。

#include<bits/stdc++.h>
using namespace std;

/*
*   定义一个hotel
*/
struct Hotel{
    string name;
    int dis;
    int price;

    Hotel(string s, int d, int p){
        name = s;
        dis = d;
        price = p;
    }
};

struct cmp{
    bool operator () (const Hotel &a, const Hotel &b){
        return a.dis + a.price > b.dis + b.price;
    }
};

bool isSkyLinePoint(vector<Hotel> S, Hotel h){
    for(int i = 0; i < S.size(); i++){
        if(S[i].dis <= h.dis){
            if(S[i].price <= h.price){
                if(S[i].price == h.price && S[i].dis == h.dis) return 1;
                return 0;
            }
        }
    }

    return 1;
}

int main(){
    priority_queue<Hotel, vector<Hotel>, cmp> hotels;                                             //存放所有hotel的优先队列

    hotels.push(Hotel("a", 1, 9));
    hotels.push(Hotel("b", 2, 10));
    hotels.push(Hotel("c", 4, 8));

    hotels.push(Hotel("d", 6, 7));
    hotels.push(Hotel("e", 9, 10));
    hotels.push(Hotel("f", 7, 5));

    hotels.push(Hotel("g", 5, 6));
    hotels.push(Hotel("h", 4, 3));
    hotels.push(Hotel("i", 3, 2));

    hotels.push(Hotel("k", 9, 1));
    hotels.push(Hotel("l", 10, 4));
    hotels.push(Hotel("m", 6, 2));

    hotels.push(Hotel("n", 8, 3));

    //依次从最小堆中取出数据，并判断S集中的point能不能dominate这个需要判断的点
    vector<Hotel> S;

    while(!hotels.empty()){
        Hotel tmp = hotels.top();
        hotels.pop();

        if(isSkyLinePoint(S, tmp))
            S.push_back(tmp);
    }

    for(int i = 0; i < S.size(); i++)
        cout<<S[i].name<<endl;

    return 0;
}

1.dominate定义

2.skyline point 定义

3.代码实现

猜你喜欢