(C语言详解)05-树9 Huffman Codes(详细解析)

05-树9 Huffman Codes (30分)
In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0’s and '1’s.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

先详解,文末有源码

拿到此题,先不要惊慌,先用自己的语言先将题目复述一遍很重要,反正代码不会写,死也要把题目先看懂吧,来我先分析题目一下

分析题目

英语强的自己翻译原文,英语不强的直接听博主把这个给分析完,在课程里老师讲过,霍夫曼这个人提了霍夫曼编码,然后他的学生跟着它构造霍夫曼编码,发现构造编码后的结果不唯一?这可怎么办?编个程序判断它一下。然后给出input和output的规格,这种规格情况下非常重要啦。一定要看懂,因为这会导致你main函数如何书写。这时候前景介绍完了开始介绍输入和输出

输入

先输入个N(2到63之间),代表我要输入的字符的个数,前一个是字符,后一个是它出现的频率,连续有N对!停!!!!输入的值是按照对的哟!
在输入学生提交的个数M<1000,这时候呢,开始把自己的哈夫曼编码效果进行提交,看是否是符合条件的

输出

答案就两种,Yes or No,因此这个程序到此为止了。开始要解决题目了

温故知识

王江涛也就是王道长曾说过,温故比自新更重要,说看了那么多遍道德经跟自己重来没看过,每次都会有新得。下面给出温故知识的内容

判断准测1:WPL

哈夫曼树是为了寻找最小的WPL,因此我构造树的时候也是要有WPL的,如果编个程序不知道wpl,还是回去再听听视频
在这里插入图片描述

判断准则2:前缀码

前缀码是什么?不知道?不知道怎么编造,孟德斯鸠吧你,下面给出前缀码的定义:

在这里插入图片描述
如何做到无义性?
当你所有的结点都在叶节点上,就不会出现一个字符是另一个字符的前缀

哈夫曼树特点

在这里插入图片描述

代码编写分析

这个代码视频里用最小堆完成,那么我们直接用最小堆做,因此先要知道最小堆:但是都来到这里了,你应该知道一点点。然后知道最小堆之后,我们还要会建造哈夫曼树!怎么建造?
在这里插入图片描述
也就是左孩子取一个,右孩子取一个,然后权值就等于两者权值之和,然后将和入堆.下面我们哈夫曼树会建造之后,怎么判断?按照上面准则进行判断,第一无前缀码,第二最优编码。下面进行源码分析

代码分析

#define MaxSize 64
typedef struct TNode *HuffmanTree;
struct TNode {
	int Weight;
	HuffmanTree Left;
	HuffmanTree Right;
};//哈夫曼树标准定义
typedef struct HeapNode *MinHeap;
struct HeapNode {
	HuffmanTree Elements[MaxSize];
	int Size;
};
char ch[MaxSize];//输入的字符
int N,w[MaxSize],TotalCodes;//编码中含字符个数,以及频率 最优编码长度

这是提前的结构体声明,不用过多强调,特别清晰的

MinHeap CreateHeap();//创建最小堆
HuffmanTree CreatHuffman();//创建哈夫曼结点
void Insert(MinHeap H,HuffmanTree X);//插入元素
HuffmanTree DeleteMin(MinHeap H);//删除最小堆元素
HuffmanTree BuildHuffman(MinHeap H);//创建哈夫曼
int WPL(HuffmanTree root, int depth);//算	WPL的权值
//带权路径长度(WPL):设二叉树有n个叶子结点,每个叶子结点
//带有权值wk,从根结点到每个叶子结点的长度为lk,则每个
//叶子结点的带权路径长度之和 wk*lk

//最优二叉树或哈夫曼树:WPL最小的二叉树
int Judge();//判断

创建最小堆

MinHeap CreateHeap()
{
	MinHeap H;
	H = (MinHeap)malloc(sizeof(struct HeapNode));
	H->Size = 0;
	H->Elements[0] = (HuffmanTree)malloc(sizeof(struct TNode));
	H->Elements[0]->Weight = -1;
	H->Elements[0]->Left = H->Elements[0]->Right = NULL;
	return H;
}

大家看到这段代码的时候,不用紧张就是做最普通的初始化最小堆。

创建哈夫曼结点

HuffmanTree CreatHuffman()
{
	HuffmanTree  T;
	T = (HuffmanTree)malloc(sizeof(struct TNode));
	T->Left = T->Right = NULL;
	T->Weight = 0;
	return T;
}

想一想,哈夫曼树有什么特点,一般的二叉树的特点吧,先有左右孩子,有个带权值。

插入删除堆

void Insert(MinHeap H,HuffmanTree X)
{
	int i = ++H->Size;
	while(H->Elements[i/2]->Weight > X->Weight)
	{
		H->Elements[i] = H->Elements[i/2];
		i/=2;
	}
	H->Elements[i] = X;
}
//最小堆删除元素
HuffmanTree DeleteMin(MinHeap H)
{
	HuffmanTree MinTtem,temp;
	int Parent,Child;
	MinTtem = H->Elements[1];
	temp = H->Elements[H->Size--];
	for(Parent = 1;Parent *2<=H->Size;Parent = Child) {
		Child = Parent * 2;
		if((Child != H->Size) && (H->Elements[Child]->Weight > H->Elements[Child+1]->Weight))
			Child ++;
		if(temp->Weight <= H->Elements[Child]->Weight) 
			break;
		else
			H->Elements[Parent] = H->Elements[Child];
	}
	H->Elements[Parent] = temp;
	return MinTtem;
}

插入或删除堆的元素都是根据Weight进行插入删除哟,这个要绕一下弯路,这个可以去对比最小堆。

构建完整哈夫曼树

HuffmanTree BuildHuffman(MinHeap H)
{
	HuffmanTree T;
	int num = H->Size;
	for(int i=1;i<num;i++)
	{
		T = CreatHuffman();
		T->Left = DeleteMin(H);
		T->Right = DeleteMin(H);
		T->Weight = T->Left->Weight + T->Right->Weight;
		Insert(H,T);
	}
	T = DeleteMin(H);
	return T;
}

我之前在温故知识里讲过,不做累赘陈述,每一步都是课上的操作重现。

WPL计算

int WPL(HuffmanTree root,int depth)
{
	if((root->Left == NULL ) && (root->Right == NULL))
		return depth*root->Weight;
	else
		return WPL(root->Left,depth+1) + WPL(root->Right,depth+1);
}

递归判断,跟判断树高一样,大家可以把判断树高代码对比一下,
(c语言)求解二叉树的高度(包含测试源码)
鉴定过了,也是我写的,有空大家看一下。

判断学生yes or no

int Judge()
{
	HuffmanTree T,p;
	char chl,*codes;
	codes = (char *)malloc(sizeof(char)*MaxSize);
	int length = 0,flag=1,wgh,j;
	T = CreatHuffman();
	
	for(int i=0;i<N;i++)
	{
		scanf("\n%c %s",&chl,codes);
		if(strlen(codes)>=N)
			flag = 0;
		else{
			for(j = 0;chl !=ch[j];j++);
			wgh = w[j];
			p = T;
			for(j=0;j<strlen(codes);j++)
			{
				if(codes[j]=='0') {
					if(!p->Left)
						p->Left = CreatHuffman();
					p = p->Left;
					
				}else if(codes[j] == '1') {
					if(!p->Right)
						p->Right = CreatHuffman();
					p = p->Right;
				}
				if(p->Weight) flag = 0;
			}
			if(p->Left || p->Right )
				flag = 0;
			else
				p->Weight = wgh;
		}
		length += strlen(codes)*p->Weight;
	}
	if(length!=TotalCodes)
		flag = 0;
	return flag;
}

博主,这段代码怎么解释?温故里面讲了两条准则,一个是前缀码一个是wpl。根据学生的输入要建造树的,怎么建造,0建造左子树,1建造右子树,按照概念说:如果编码不在叶节点上那就会有歧义,那就不是无前缀码了。因此哈,

	if(length!=TotalCodes)
		flag = 0;

这是判断最优编码根据哈夫曼定义,

			if(p->Left || p->Right )
				flag = 0;

这是根据防止有歧义,也是根据何老师上课的那句话:当你所有的结点都在叶节点上,就不会出现一个字符是另一个字符的前缀
敲黑板啦,经典,抄它!

完整代码如下:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define MaxSize 64
typedef struct TNode *HuffmanTree;
struct TNode {
	int Weight;
	HuffmanTree Left;
	HuffmanTree Right;
};//哈夫曼树标准定义
typedef struct HeapNode *MinHeap;
struct HeapNode {
	HuffmanTree Elements[MaxSize];
	int Size;
};//这是为哈夫曼树要玩的最小堆。
//任何字符的编码都不是另一字符编码的前缀
//当你所有的结点都在叶节点上,就不会出现一个字符是另一个字符的前缀
//跳两个最小的合并,再找,合并
//
char ch[MaxSize];//输入的字符
int N,w[MaxSize],TotalCodes;//编码中含字符个数,以及频率 最优编码长度

MinHeap CreateHeap();//创建最小堆
HuffmanTree CreatHuffman();//利用最小堆申请堆
void Insert(MinHeap H,HuffmanTree X);//插入元素
HuffmanTree DeleteMin(MinHeap H);//删除最小堆元素
HuffmanTree BuildHuffman(MinHeap H);//创建哈夫曼
int WPL(HuffmanTree root, int depth);//算	WPL的权值
//带权路径长度(WPL):设二叉树有n个叶子结点,每个叶子结点
//带有权值wk,从根结点到每个叶子结点的长度为lk,则每个
//叶子结点的带权路径长度之和 wk*lk

//最优二叉树或哈夫曼树:WPL最小的二叉树
int Judge();//判断

int main()
{
	/*
	以最小堆构建哈夫曼树,得到最优编码长度
	根据学生输入建造树,判断是否为最优编码。判断根据:
			最优秀编码长度为一,检查是否与第一步所得的最优编码长度一致
			是否存在二义性,即是否具有前缀码
	*/
	int M;
	HuffmanTree tmp,ROOT;
	scanf("%d",&N);
	MinHeap H = CreateHeap();
	//根据输入的字符已经其频率,一个个插入构建最小堆
	for(int i=0;i<N;i++)
	{
		getchar();
		scanf("%c %d",&ch[i],&w[i]);
		tmp = CreatHuffman();
		tmp->Weight = w[i];
		Insert(H,tmp);
	}
	ROOT = BuildHuffman(H);
	TotalCodes = WPL(ROOT, 0);
	scanf("%d",&M);
	for(int i =0;i<M;i++)
	{
		if(Judge())
			printf("Yes\n");
		else
			printf("No\n");
	}
	return 0;
}
MinHeap CreateHeap()
{
	MinHeap H;
	H = (MinHeap)malloc(sizeof(struct HeapNode));
	H->Size = 0;
	H->Elements[0] = (HuffmanTree)malloc(sizeof(struct TNode));
	H->Elements[0]->Weight = -1;
	H->Elements[0]->Left = H->Elements[0]->Right = NULL;
	return H;
}
//创建哈夫曼树结点,这里注意初始化其权重和左右儿子
HuffmanTree CreatHuffman()
{
	HuffmanTree  T;
	T = (HuffmanTree)malloc(sizeof(struct TNode));
	T->Left = T->Right = NULL;
	T->Weight = 0;
	return T;
}
//最小堆的插入
void Insert(MinHeap H,HuffmanTree X)
{
	int i = ++H->Size;
	while(H->Elements[i/2]->Weight > X->Weight)
	{
		H->Elements[i] = H->Elements[i/2];
		i/=2;
	}
	H->Elements[i] = X;
}
//最小堆删除元素
HuffmanTree DeleteMin(MinHeap H)
{
	HuffmanTree MinTtem,temp;
	int Parent,Child;
	MinTtem = H->Elements[1];
	temp = H->Elements[H->Size--];
	for(Parent = 1;Parent *2<=H->Size;Parent = Child) {
		Child = Parent * 2;
		if((Child != H->Size) && (H->Elements[Child]->Weight > H->Elements[Child+1]->Weight))
			Child ++;
		if(temp->Weight <= H->Elements[Child]->Weight) 
			break;
		else
			H->Elements[Parent] = H->Elements[Child];
	}
	H->Elements[Parent] = temp;
	return MinTtem;
}
//构建哈夫曼树
HuffmanTree BuildHuffman(MinHeap H)
{
	HuffmanTree T;
	int num = H->Size;
	for(int i=1;i<num;i++)
	{
		T = CreatHuffman();
		T->Left = DeleteMin(H);
		T->Right = DeleteMin(H);
		T->Weight = T->Left->Weight + T->Right->Weight;
		Insert(H,T);
	}
	T = DeleteMin(H);
	return T;
}

//根据哈夫曼树,计算最优编码长度并返回

int WPL(HuffmanTree root,int depth)
{
	if((root->Left == NULL ) && (root->Right == NULL))
		return depth*root->Weight;
	else
		return WPL(root->Left,depth+1) + WPL(root->Right,depth+1);
}

//判断是否为最优编码;
int Judge()
{
	HuffmanTree T,p;
	char chl,*codes;
	codes = (char *)malloc(sizeof(char)*MaxSize);
	int length = 0,flag=1,wgh,j;
	T = CreatHuffman();
	
	for(int i=0;i<N;i++)
	{
		scanf("\n%c %s",&chl,codes);
		if(strlen(codes)>=N)
			flag = 0;
		else{
			for(j = 0;chl !=ch[j];j++);
			wgh = w[j];
			p = T;
			for(j=0;j<strlen(codes);j++)
			{
				if(codes[j]=='0') {
					if(!p->Left)
						p->Left = CreatHuffman();
					p = p->Left;
					
				}else if(codes[j] == '1') {
					if(!p->Right)
						p->Right = CreatHuffman();
					p = p->Right;
				}
				if(p->Weight) flag = 0;
			}
			if(p->Left || p->Right )
				flag = 0;
			else
				p->Weight = wgh;
		}
		length += strlen(codes)*p->Weight;
	}
	if(length!=TotalCodes)
		flag = 0;
	return flag;
}
发布了137 篇原创文章 · 获赞 30 · 访问量 8825

猜你喜欢

转载自blog.csdn.net/m0_37149062/article/details/105641639