Shell 编程技巧：批量转换Markdown文件

由于一些原因，需要将以前编写的所有markdown文件转成docx文件，以便做一个备份，特别是原文档中引用的图片需要嵌入docx文件，作本地化保存。先上脚本吧：

sudo yum -y install pandoc
# set new line char as IFS 
IFS=$'\n'

# convert...
for srcFile in $(find . -type f -name '*.md'); do
	sinkFile="$(dirname $srcFile)/$(basename $srcFile .md).docx"
    echo "source file: $srcFile"
    echo "sink file: $sinkFile"
    pandoc -o $sinkFile $srcFile
done

# restore default IFS chars 
IFS=$' \t\n'

这个脚本里还是有不少“知识点”的，这里特别强调以下几条：

由于文件名可能含有空格，在迭代时会被截断！使用双引号包裹srcFile变量："$srcFile" 并不能解决问题，因为迭代的元素已经不是一行一行的文件路径了，使用echo 'source file: '"$srcFile"打印一下问题就能暴露出来。真正有效的做法是必须设定IFS！将其设为换行符\n，只有这样才能正确地将find输出的一整行可能包含空格的文件路径解析为一个独立的元素！
为IFS设置换行符\n时必须是：IFS=$'\n'，不是IFS='\n'，$不可省略
上述命令使用 find . -type f -name '*.md' -exec sh -c '...' sh {} + 这种形式也可以实现，好处是不用特别配置IFS了，在-exec中{}能完好表示每一行输出，不存在空格截断问题。不过，因为在这个案例中，我们还是要在文件路径的基础上使用dirname和basename来拼接我们需要的目标文件路径,同时也无法避免不使用for循环，所以这时使用-exec的优势并不明显，反而还很难阅读，所以不如使用上面的传统模式来得很简洁一些。以下是find命令的-exec和-execdir的一些测试命令，对于理解它们的用法有一定的帮助：

测试

find . -type f -name '*.md' -exec ls {
    
    } \;
find . -type f -name '*.md' -execdir ls {
    
    } \;
find . -type f -name '*.md' -exec dirname {
    
    } \;
find . -type f -name '*.md' -execdir dirname {
    
    } \;
# output: 
find . -type f -name '*.md' -exec sh -c '
    for name do
        ls "$(dirname "$name")/$(basename "$name")"
    done' sh {
    
    } +

find . -type f -name '*.md' -exec sh -c '
    for name do
        ls "$(dirname "$name")/$(basename "$name" ".md")"
    done' sh {
    
    } +

find . -type f -name '*.md' -exec sh -c '
    for name do
        echo "$(dirname "$name")/$(basename "$name" ".md").docx"
    done' sh {
    
    } +

补充说明：

	-execdir command {
    
    } +
		Like -exec, but the speci`fied command is run from the subdirectory containing the matched file ...

参考：

https://unix.stackexchange.com/questions/389705/understanding-the-exec-option-of-find

Shell 编程技巧：批量转换Markdown文件

测试

猜你喜欢