哈希表(上)

哈希冲突的闭散列处理方式

哈希的概念

在顺序搜索以及二叉树搜索树中，元素存储的位置与元素的关键码之间没有对应的关系，因此查找一个元素时，必须要经过关键码的多次比较，搜索效率取决于搜索过程中元素的比较次数。
理想的搜索方法是：可以不经过任何的比较，一次直接从中找到要搜索的元素。如果构造一种存储结构，通过某种函数使得元素的存储位置与元素的关键码之间有一一映射的关系，那么在查找过程中通过该哈希函数可以直接找到该元素。当向该存储结构中：

（1）插入元素时：根据某种函数利用待插元素的关键码计算出该元素的存储位置并按照此位置进行存放。

（2）搜索元素时：根据某种函数利用待搜索元素的关键码计算出该元素应该存储的位置，在该位置取出元素与搜索元素比较，若相等时则表示搜索成功，否则搜索失败。
该方式即为哈希（散列）方法，哈希方法中使用的某种函数称为哈希（散列）函数，构造出来的存储结构称为哈希（散列）表。

举个例子
数租={1,3,5,6,8} 哈希函数：Hash(key)=key%10
将key值代入到哈希函数中，从而取出该key值的存储位置，此时，搜索指定元素如8时，直接利用哈希函数计算出存储位置下标，将其对应的值与要搜索的值8比较，相等时，表示找到了，否则没找到。通过上述操作可以看出，搜索的速度非常快。

但是存在这样一个问题：当向数据集合中插入元素11时，将元素11存储在哪儿？依旧利用key=11计算出Hash(11)=11%10=1，那么原本应该存储在下标为1的位置，但是此时下标为1的位置已经有存储的元素1，对于出现的这种现象我们称为哈希冲突！
处理哈希冲突有两种常见的处理方式

闭散列处理哈希冲突

闭散列：也叫开放地址法，当发生哈希冲突时，如果哈希表未被装满，说明哈希表中必然还有空位置，那么就可以把key存放在表中的“下一个”空位中。那么下一个空位置如何查找呢？

我们用线性探测的方式寻找下一个空位置。下面举例介绍如何线性探测寻找下一个空位置去处理哈希冲突：

设关键码集合={1,3,5,6,8,11,12}，哈希表的大小为11，哈希函数的设计用除留余数法，即哈希函数为Hash(Key)=Key%10；
通过计算，
我们将1存放在下标为1的位置，
3存放在下标为3的位置，
5存放在下标为5的位置，
......
12存放是应存放在下标为1的位置，但是该位置已经存放了1，我们就探测1的下一位置是否为空，即2的位置存放11；

代码实现部分

头文件定义

#include <stdlib.h>
#include <assert.h>
#include <stdio.h>

typedef enum
{
	EXIST,
	DELETE,
	EMPTY
}State;

typedef int KeyType;
typedef int (*HashFunc)(KeyType key,int capacity);  //定义函数指针，参数类型为KeyType

typedef struct HashElem
{
	KeyType key;
	State state;
}HashElem;

typedef struct HashTable
{
	HashElem *table;   
	int size;			//数据个数
	int capacity;					//容量
	HashFunc hashfunc; 	//哈希函数
}HashTable;

初始化及销毁

//初始化/销毁
void HashInit(HashTable *pH,HashFunc hashfunc,int capacity)
{
	pH->table = (HashElem*)malloc(sizeof(HashElem)*capacity);
	assert(pH->table);

	pH->size = 0;
	pH->hashfunc = hashfunc;
	pH->capacity = capacity;
	for ( int i = 0; i < capacity; i++)
	{
		pH->table[i].state = EMPTY;
	}
}

void HashDestroy(HashTable *pH)
{
	free(pH->table);
}

插入

插入之前，我们需要判断是否需要扩大容量，减小哈希冲突的概率


int HashInsert(HashTable *pH, KeyType key);

int IsExpand(HashTable *pH)
{
	if (pH->size <=  pH->capacity * 0.8)
	{
		return 0;
	}
	
	HashTable newHT;
	HashInit(&newHT,pH->hashfunc, pH->capacity * 2);
	for (int i = 0; i < pH->capacity * 2; i++)
	{
		if (pH->table[i].state == EXIST)
		{
			HashInsert(&newHT, pH->table[i].key);
		}
	}

	free(pH->table);   //释放原来的table
	pH->table = newHT.table; //将新的table地址赋给
	pH->capacity = newHT.capacity;

	return 1;
}

int HashInsert(HashTable *pH,KeyType key)
{
	IsExpand(pH);

	int  index = pH->hashfunc(key, pH->capacity);

	while (1)
	{
		if (pH->table[index].key == key && pH->table[index].state == EXIST)
		{
			return -1;
		}

		if (pH->table[index].state != EXIST)
		{
			pH->table[index].key = key;
			pH->table[index].state = EXIST;
			pH->size++;
			return 1;
		}

		index = (index + 1) % pH->capacity;
	}
}

查找

//查找
int HashFind(HashTable *pH,KeyType  key)
{
	int index = pH->hashfunc(key,pH->capacity);
	while (pH->table[index].state != EMPTY)
	{
		if (pH->table[index].key == key)
		{
			return index;
		}
		index = (index + 1) % pH->capacity;
	}

	return -1;
}

删除

int HashRemove(HashTable *pH,KeyType key)
{
	int index = pH->hashfunc(key,pH->capacity);
	while (pH->table[index].state != EMPTY)
	{
		if (pH->table[index].key == key && pH->table[index].state == EXIST)
		{
			pH->table[index].state = DELETE;
			return 1;
		}
		index = (index + 1) % pH->capacity;
	}
	return -1;
}

打印出key所对应的value以及状态

void PrintKeyValue(HashTable *pH)
{
	printf("index --> value -->state\n");
	for (int i = 0; i < pH->size; i++)
	{
		int index = pH->hashfunc(pH->table[i].key,pH->capacity);
		printf("%d --> %d  -->%d\n",index,pH->table[index].key,pH->table[index].state);
	}
}

测试部分我就不在这里放了，有兴趣的话可以自行测试

哈希冲突的闭散列处理方式

猜你喜欢