【码上一点】给定两个字符串，使用第二个字符串去分割第一个，实现String中的split()方法的功能。要求不使用subString()和原生split()方法

前几天朋友丢给我这么一道编码题，问我有没有什么想法，我当时第一想法是这道题是不是leetcode那种，这货不会是要在我这装逼炫耀吧，那就尴尬了，作为一个还没深入接触过leetcode的程序员（ps：实在惭愧）,怎么可能短时间内把既准确又高效的代码甩给他，并当面给他重重一击呢。哈哈哈，开个玩笑，言归正传。

说起这种分割字符串的问题，我最先想到了学生时代接触到的游标概念，所谓游标，其实就是一个从某一端向另一端依次遍历的指针，所以我在想，用游标是不是可以作为这道题的一种解决策略呢。基于以上想法，快速梳理一下，要达到目的必须注意以下几个关键点：

将第二个字符串作为游标，从第一个字符串的下标0开始依次向后比对；

进行比对的左右两项都有可能是多字符的，既然无法使用subString()方法，那么就要避免左右两项作为字符串整体对比的逻辑，所以我能想到的是使用数组；

匹配上的部分如何同不匹配的部分区分开，哦，可以使用包装类，那就可以用null值作为特殊值达到区分的效果；

游标遍历结束后，一个null值或者连续的null值会被作为一个分割界限的整体，后续逻辑要以此为判断组装结果集以及完成流程控制；

话不多说直接上代码（以下代码可能有的bug点：特殊字符、null指针等，目前均尚不考虑）。

public class SplitDemo {

    public List<String> split(String str, String reg) {
        List<String> result = new ArrayList<>();
        Character[] strArr = toCharacterArray(str);
        Character[] regArr = toCharacterArray(reg);
        //游标逻辑
        for (int i = 0; i < strArr.length; i++) {
            //标识多字符是否完全匹配
            boolean flag = false;
            for (int j = 0; j < regArr.length; j++) {
                flag = regArr[j].equals(strArr[i + j]);
                if (!flag) {
                    break;
                }
            }
            if (!flag) {
                continue;
            }
            //特殊null值赋值
            for (int j = 0; j < regArr.length; j++) {
                strArr[i + j] = null;
            }
        }
        //结果集封装
        String item = "";
        for (Character character : strArr) {
            if (null != character) {
                item = item + character;
            } else {
                if (!"".equals(item)) {
                    result.add(item);
                    item = "";
                }
            }
        }
        if (!"".equals(item)) {
            result.add(item);
        }
        return result;
    }
    /**
    将字符串转为字符包装类数组
    */
    private Character[] toCharacterArray(String str) {
        Character[] result = new Character[str.length()];
        for (int i = 0; i < result.length; i++) {
            result[i] = str.charAt(i);
        }
        return result;
    }
}
复制代码

测试方法

    public static void main(String[] args) {
        List<String> result = new SplitDemo().split("asasdecdsdxa", "sd");
        for (String str : result) {
            System.out.println(str);
        }
    }
复制代码

输出结果

初步完成，但感觉有些繁琐，贴给朋友交流之后，有了另外的处理方法。直接上朋友的代码截图。

这段代码有几个主要的点：

indexOf(String str, int fromIndex)，返回第一个字符串第一次完整匹配到第二个字符串时首字符的下标，没有匹配到则返回-1；

以控制第一个字符串起始进行匹配判断的位置，也就是indexOf(String str, int fromIndex)入参中的formIndex，达到依次重复截取的效果；

整体逻辑非常清晰，而且代码量更少，初步判断效率更高，这里我不免对String类中的indexOf(String str, int formIndex)方法充满了好奇，下面让我们追一下JDK源码（版本为JDK8）。

    /**
     * Returns the index within this string of the first occurrence of the
     * specified substring, starting at the specified index.
     *
     * <p>The returned index is the smallest value <i>k</i> for which:
     * <blockquote><pre>
     * <i>k</i> &gt;= fromIndex {@code &&} this.startsWith(str, <i>k</i>)
     * </pre></blockquote>
     * If no such value of <i>k</i> exists, then {@code -1} is returned.
     *
     * @param   str         the substring to search for.
     * @param   fromIndex   the index from which to start the search.
     * @return  the index of the first occurrence of the specified substring,
     *          starting at the specified index,
     *          or {@code -1} if there is no such occurrence.
     */
    public int indexOf(String str, int fromIndex) {
        return indexOf(value, 0, value.length,
                str.value, 0, str.value.length, fromIndex);
    }
    
    /**
     * Code shared by String and StringBuffer to do searches. The
     * source is the character array being searched, and the target
     * is the string being searched for.
     *
     * @param   source       the characters being searched.
     * @param   sourceOffset offset of the source string.
     * @param   sourceCount  count of the source string.
     * @param   target       the characters being searched for.
     * @param   targetOffset offset of the target string.
     * @param   targetCount  count of the target string.
     * @param   fromIndex    the index to begin searching from.
     */
    static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
        if (fromIndex >= sourceCount) {
            return (targetCount == 0 ? sourceCount : -1);
        }
        if (fromIndex < 0) {
            fromIndex = 0;
        }
        if (targetCount == 0) {
            return fromIndex;
        }

        char first = target[targetOffset];
        int max = sourceOffset + (sourceCount - targetCount);

        for (int i = sourceOffset + fromIndex; i <= max; i++) {
            /* Look for first character. */
            if (source[i] != first) {
                while (++i <= max && source[i] != first);
            }

            /* Found first character, now look at the rest of v2 */
            if (i <= max) {
                int j = i + 1;
                int end = j + targetCount - 1;
                for (int k = targetOffset + 1; j < end && source[j]
                        == target[k]; j++, k++);

                if (j == end) {
                    /* Found whole string. */
                    return i - sourceOffset;
                }
            }
        }
        return -1;
    }
复制代码

可以看到，其在内部调用了静态方法indexOf(char[] source, int sourceOffset, int sourceCount,char[] target, int targetOffset, int targetCount,int fromIndex)，传入参数从左往右依次可以理解为源字节数组（标题中的第一个字符串），源字节数组偏移量（内部调用时默认传入0），源字节数组长度，要查找的字节数组（标题中的第二个字符串）,要查找的字节数组偏移量（依旧是0），要查找的字节数组长度，以及初始匹配位置的下标。

进到方法内部，可以看到大体的流程是：

非法入参过滤

获得要查找的字节数组的第一个元素，计算得到可匹配到查找数组第一个元素的源字节数组的最大下标（这里表述起来比较绕~）

开启循环，初始位置其实就是调用这个方法时传入的最后一个参数，循环结束的标志是i超过了步骤2中的max值

快速找到匹配第一个字符的位置

开启内层循环，以步骤4得到的位置为起始点，快速匹配要查找字符数组的剩余元素，完全匹配则返回i，不匹配则结束本次内层循环、外层循环，进入下一次外层循环；

最终未找到则返回-1

究其内部原理，实际上跟作者一开始的游标方案有异曲同工之处，但更为精妙，尤其是流程控制语句、临界值运用都非常精湛，值得学习！

以上就是本次分享的全部内容啦，有不对的地方欢迎同学们指正！

【码上一点】给定两个字符串，使用第二个字符串去分割第一个，实现String中的split()方法的功能。要求不使用subString()和原生split()方法

猜你喜欢