版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Apple_hzc/article/details/50568952
Oulipo
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 8574 Accepted Submission(s): 3459
Total Submission(s): 8574 Accepted Submission(s): 3459
Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN
Sample Output
1
3
0
题意是要求输入一个单词和一段话,找出单词在这段话中出现的次数。典型的单匹配KMP算法,在这里简单回顾一下KMP算法的核心思想:
KMP是一种非常高效的字符串匹配算法,其核心是计算字符串f每一个位置之前的字符串的前缀和后缀公共部分的最大长度(不包括字符串本身,否则最大长度始终是字符串本身)。获得f每一个位置的最大公共长度之后,就可以利用该最大公共长度快速和字符串O比较。当每次比较到两个字符串的字符不同时,我们就可以根据最大公共长度将字符串f向前移动(已匹配长度-最大公共长度)位,接着继续比较下一个位置。事实上,字符串f的前移只是概念上的前移,只要我们在比较的时候从最大公共长度之后比较f和O即可达到字符串f前移的目的。
理解了kmp算法的基本原理,下一步就是要获得字符串f每一个位置的最大公共长度。这个最大公共长度在算法导论里面被记为next数组。在这里要注意一点,next数组表示的是长度,下标从1开始;但是在遍历原字符串时,下标还是从0开始。假设我们现在已经求得next[1]、next[2]、……next[i],分别表示长度为1到i的字符串的前缀和后缀最大公共长度,现在要求next[i+1]。由上图我们可以看到,如果位置i和位置next[i]处的两个字符相同(下标从零开始),则next[i+1]等于next[i]加1。如果两个位置的字符不相同,我们可以将长度为next[i]的字符串继续分割,获得其最大公共长度next[next[i]],然后再和位置i的字符比较。这是因为长度为next[i]前缀和后缀都可以分割成上部的构造,如果位置next[next[i]]和位置i的字符相同,则next[i+1]就等于next[next[i]]加1。如果不相等,就可以继续分割长度为next[next[i]]的字符串,直到字符串长度为0为止。
以下是KMP有参函数核心代码:
理解了以上内容,这个题就不难做出来了。
以下是这道题的AC代码:
1
3
0
题意是要求输入一个单词和一段话,找出单词在这段话中出现的次数。典型的单匹配KMP算法,在这里简单回顾一下KMP算法的核心思想:
KMP是一种非常高效的字符串匹配算法,其核心是计算字符串f每一个位置之前的字符串的前缀和后缀公共部分的最大长度(不包括字符串本身,否则最大长度始终是字符串本身)。获得f每一个位置的最大公共长度之后,就可以利用该最大公共长度快速和字符串O比较。当每次比较到两个字符串的字符不同时,我们就可以根据最大公共长度将字符串f向前移动(已匹配长度-最大公共长度)位,接着继续比较下一个位置。事实上,字符串f的前移只是概念上的前移,只要我们在比较的时候从最大公共长度之后比较f和O即可达到字符串f前移的目的。
理解了kmp算法的基本原理,下一步就是要获得字符串f每一个位置的最大公共长度。这个最大公共长度在算法导论里面被记为next数组。在这里要注意一点,next数组表示的是长度,下标从1开始;但是在遍历原字符串时,下标还是从0开始。假设我们现在已经求得next[1]、next[2]、……next[i],分别表示长度为1到i的字符串的前缀和后缀最大公共长度,现在要求next[i+1]。由上图我们可以看到,如果位置i和位置next[i]处的两个字符相同(下标从零开始),则next[i+1]等于next[i]加1。如果两个位置的字符不相同,我们可以将长度为next[i]的字符串继续分割,获得其最大公共长度next[next[i]],然后再和位置i的字符比较。这是因为长度为next[i]前缀和后缀都可以分割成上部的构造,如果位置next[next[i]]和位置i的字符相同,则next[i+1]就等于next[next[i]]加1。如果不相等,就可以继续分割长度为next[next[i]]的字符串,直到字符串长度为0为止。
以下是KMP有参函数核心代码:
void makeNext(const char P[],int next[])
{
int q,k;
int m = strlen(P);
next[0] = 0;
for (q = 1,k = 0; q < m; ++q)
{
while(k > 0 && P[q] != P[k])
k = next[k-1];
if (P[q] == P[k])
{
k++;
}
next[q] = k;
}
}
void kmp(const char T[],const char P[],int next[])
{
int n,m;
int i,q;
n = strlen(T);
m = strlen(P);
makeNext(P,next);
for (i = 0,q = 0; i < n; ++i)
{
while(q > 0 && P[q] != T[i])
q = next[q-1];
if (P[q] == T[i])
{
q++;
}
if (q == m)
{
printf("Pattern occurs with shift:%d\n",(i-m+1));
}
}
}
以下是KMP无参函数核心代码:
void makeNext()
{
int i = 0,j = -1;
next[0] = -1;
while (i<n)
{
if(j == -1 || a[i] == a[j])
{
i++;
j++;
next[i] = j;
}
else
j = next[j];
}
}
理解了以上内容,这个题就不难做出来了。
以下是这道题的AC代码:
#include<stdio.h>
#include<string.h>
#define max1 10005
#define max2 1000005
int next[max1]={0},count;
char str1[max1],str2[max2];
void makeNext(char P[max1],int next[max1])
{
int q,k;
int m = strlen(P);
next[0] = 0;
for (q = 1,k = 0; q < m; ++q)
{
while(k > 0 && P[q] != P[k])
k = next[k-1];
if (P[q] == P[k])
{
k++;
}
next[q] = k;
}
}
void kmp(char T[max2],char P[max1],int next[max1])
{
int n,m;
int i,q;
count=0;
n = strlen(T);
m = strlen(P);
makeNext(P,next);
for (i = 0,q = 0; i < n; ++i)
{
while(q > 0 && P[q] != T[i])
q = next[q-1];
if (P[q] == T[i])
{
q++;
}
if (q == m)
{
count++;
}
}
printf("%d\n",count);
}
int main()
{
int n;
while(~scanf("%d",&n))
{
while(n--)
{
scanf("%s",str1);
scanf("%s",str2);
kmp(str2,str1,next);
}
}
return 0;
}