作为一个脱胎于 Yandex,开源的,面向 OLAP(online analytical processing of queries)的分布式纵列数据库管理系统,ClickHouse 一直活跃在明星数据库的舞台上。
在 2022 年 4 月 26 日的一篇博客 ClickHouse Docs have a new look and feel! 中,Rich Raposa 表示,鉴于文档对帮助用户(开发者、DBA、架构师等)理解和使用产品的重要作用,ClickHouse 的文档做了很多努力在改善现有内容和增加新内容上。除去传统的面向功能的数据库操作手册(面向 DBA)和开发指南(面向数据库开发人员),ClickHouse 重点增加了一系列教程文档,方便开发者和 DBA 快速上手。这些文档包括:
- 帮助用户快速上手的快速开始和教程文档
- 一系列帮助用户连接知名 UI/BI 工具的教程,涵盖 Grafana, Metabase, Superset, and Tableau
- 一系列帮助用户连接外部工具的教程,例如 Kafka, AWS S3, PostgreSQL, MySQL, Airbyte
总体来看,ClickHouse 的新文档重点改善了新用户的上手体验,同时整体提升了文档的可操作性,增加了大量可实操的教程。另外,新文档补充了大量数据库与外部生态结合的内容,这就让技术内容不再局限于数据库本身。这对个人开发者无疑是非常友好的。
实际体验
当然,我们作为文档工程师,不能只看宣传,也要看实际效果。现在我们就跟着 ClickHouse 的教程实际操作一下。
根据官方文档,ClickHouse 兼容多种操作系统和 CPU 架构。
操作系统:
- Linux
- FreeBSD
- macOS
CPU 架构:
- x86_64
- AArch64
- PowerPC64LE
我们选择 Ubuntu 20.04 系统,体验 ClickHouse 的最新教程。
安装并启动 ClickHouse
ClickHouse 官方提供了一个 bash 脚本,帮助你自动化下载和安装。
# 下载 ClickHouse
curl https://clickhouse.com/ | sh
bash 脚本会自动进行下载。
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1292 0 1292 0 0 548 0 --:--:-- 0:00:02 --:--:-- 548
Will download https://builds.clickhouse.com/master/amd64/clickhouse
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 2140M 0 2848k 0 0 525k 0 1:09:28 0:00:05 1:09:23 569k
完成之后,运行以下命令进行安装:
sudo ./clickhouse install
Copying ClickHouse binary to /usr/bin/clickhouse.new
Renaming /usr/bin/clickhouse.new to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-copier to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-disks to /usr/bin/clickhouse.
Creating clickhouse group if it does not exist.
groupadd -r clickhouse
Creating clickhouse user if it does not exist.
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Data path configuration override is saved to file /etc/clickhouse-server/config.d/data-paths.xml.
Log path configuration override is saved to file /etc/clickhouse-server/config.d/logger.xml.
User directory path configuration override is saved to file /etc/clickhouse-server/config.d/user-directories.xml.
OpenSSL path configuration override is saved to file /etc/clickhouse-server/config.d/openssl.xml.
Creating log directory /var/log/clickhouse-server.
Creating data directory /var/lib/clickhouse.
Creating pid directory /var/run/clickhouse-server.
chown -R clickhouse:clickhouse '/var/log/clickhouse-server'
chown -R clickhouse:clickhouse '/var/run/clickhouse-server'
chown clickhouse:clickhouse '/var/lib/clickhouse'
Enter password for default user:
在这里输入密码或直接回车。
ClickHouse has been successfully installed.
Start clickhouse-server with:
sudo clickhouse start
Start clickhouse-client with:
clickhouse-client
两条命令搞定安装。再敲一条命令,就来到了传说中的 A-ha moment。
sudo clickhouse start
$ sudo clickhouse start
chown -R clickhouse: '/var/run/clickhouse-server/'
Will run clickhouse su 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Waiting for server to start
Server started
数据库服务端启动了。
在 ClickHouse 里 CRUD
教程里提供的 Connect to ClickHouse 部分使用了自带的 HTML 操作界面,可用于 SQL 查询。但由于我使用的 Ubuntu 系统是不带 UI 的,所以只能跳到 The ClickHouse Client 部分使用 CLI 客户端。
感觉这里可以加个说明,告诉没有 UI 的用户直接去 CLI 客户端。体验更顺滑。
运行客户端
运行 CLI 客户端:
clickhouse-client
运行成功
$ clickhouse-client
ClickHouse client version 22.7.1.2100 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.7.1 revision 54456.
Warnings:
* Linux is not using a fast TSC clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource
* Linux threads max count is too low. Check /proc/sys/kernel/threads-max
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
localhost.localdomain :)
创建数据库
我使用的测试机性能不足,因此报了上述 warning。这里就从教程前面的建库开始:
CREATE DATABASE IF NOT EXISTS helloworld
localhost.localdomain :) CREATE DATABASE IF NOT EXISTS helloworld
CREATE DATABASE IF NOT EXISTS helloworld
Query id: 2807edaa-c084-4501-bb0f-a32701f359ff
Ok.
0 rows in set. Elapsed: 0.006 sec.
创建数据表
接下来建表。可以看到 ClickHouse 里的任何表都需要一个 ENGINE。教程里推荐使用 MergeTree。
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)
插入数据
在操作之前,文档专门通过 A Brief Intro to Primary Keys 讲解了 ClickHouse 的一个独特之处,即主键(Primary Key)不是独一无二的!这一点确实和大部分数据库不同。这个讲解看似突兀,实际必要。否则,用户看到下面插入的数据就会产生疑惑。
在产生疑惑之前打消疑惑是非常有效的做法。
插入数据:
INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
(101, 'Hello, ClickHouse!', now(), -1.0 ),
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
插入成功:
localhost.localdomain :) INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) VALUES
(101, 'Hello, ClickHouse!', now(), -1.0 ),
(102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ),
(102, 'Sort your data based on your commonly-used queries', today(), 2.718 ),
(101, 'Granules are the smallest chunks of data read', now() + 5, 3.14159 )
INSERT INTO helloworld.my_first_table (user_id, message, timestamp, metric) FORMAT Values
Query id: 4d376221-2786-4d44-9ef9-129a8f03ddcf
Ok.
4 rows in set. Elapsed: 0.009 sec.
检查插入的数据:
SELECT * FROM helloworld.my_first_table
结果如下:
localhost.localdomain :) SELECT * FROM helloworld.my_first_table
SELECT *
FROM helloworld.my_first_table
Query id: 867f2212-6d8b-43ec-b1ff-efce3c879f73
┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│ 101 │ Hello, ClickHouse! │ 2022-07-16 00:19:30 │ -1 │
│ 101 │ Granules are the smallest chunks of data read │ 2022-07-16 00:19:35 │ 3.14159 │
│ 102 │ Insert a lot of rows per batch │ 2022-07-15 00:00:00 │ 1.41421 │
│ 102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │ 2.718 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘
4 rows in set. Elapsed: 0.002 sec.
从 CSV 文件插入数据
新建一个 CSV 文件 data.csv
并输入以下内容:
102,This is data in a file,2022-02-22 10:43:28,123.45
101,It is comma-separated,2022-02-23 00:00:00,456.78
103,Use FORMAT to specify the format,2022-02-21 10:43:30,678.90
将 CSV 的数据插入 helloworld
数据库的 my_first_table
表:
clickhouse-client --query='INSERT INTO helloworld.my_first_table FORMAT CSV' < data.csv
这里官方文档是通过 Web UI 来看数据的,我们通过客户端 CLI 操作一下:
SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp ASC
结果如下,证明数据插入成功:
SELECT *
FROM helloworld.my_first_table
ORDER BY timestamp ASC
Query id: d7216864-2b85-4ad2-9073-6c0bef7ed0c6
┌─user_id─┬─message────────────────────────────────────────────┬───────────timestamp─┬──metric─┐
│ 103 │ Use FORMAT to specify the format │ 2022-02-21 10:43:30 │ 678.9 │
│ 102 │ This is data in a file │ 2022-02-22 10:43:28 │ 123.45 │
│ 101 │ It is comma-separated │ 2022-02-23 00:00:00 │ 456.78 │
│ 102 │ Insert a lot of rows per batch │ 2022-07-15 00:00:00 │ 1.41421 │
│ 102 │ Sort your data based on your commonly-used queries │ 2022-07-16 00:00:00 │ 2.718 │
│ 101 │ Hello, ClickHouse! │ 2022-07-16 00:19:30 │ -1 │
│ 101 │ Granules are the smallest chunks of data read │ 2022-07-16 00:19:35 │ 3.14159 │
└─────────┴────────────────────────────────────────────────────┴─────────────────────┴─────────┘
操作体验
整体来说非常顺滑。ClickHouse 本身也做了非常大的努力来提升数据库的易用性和文档的可操作性。对于 ENGINE、Primary Key 等概念,尽量避免了过于复杂的解释,而专注于引导用户按照文档顺利完成任务。
客户端部分,建议使用一个统一的客户端,例如 CLI 走完全程。而不必分成 Web UI 和 CLI 两个部分。