linux block framework(1) - 块设备概念

了解linux block 基本概念

1.概念

块设备是I/O设备中的一类，当应用层对该设备读写时，是按扇区大小来读写数据的，若读写的数据小于扇区的大小，就会需要缓存区，可以随机读写设备的任意位置处的数据，例如普通文件(.txt,.c等)，硬盘，U盘，SD卡。

块设备是一种具有一定结构的随机存取设备，对这种设备的读写是按块进行的，它使用缓冲区来存放暂时的数据，待条件成熟后，从缓存一次性写入设备或者从设备一次性读到缓冲区。可以随机访问，块设备的访问位置必须能够在介质的不同区间前后移动。

Block device provides storage for a large amount of data.The kernel tries to make maximum performance with caching the data in memory, Because The I/O operations are costly.Every actual I/O operation is performed by the deive drivers, The kernel provides various function and data structure for device drivers to register their handlers.

Another requirement of the block driver layer is to hide the property of specific hardware and provide a general API to access the devices.

In hard disk, the sequence of request affects performance of device I/O, So The kernel Collect I/O requests and sorts them in block layer before calling device driver to precess I/O request.

So In block layer, kernel tries to improve performance of I/O operations with grouping the contiguous sertors and sorting I/O requests.

And this algorithm that sorting and grouping I/O requests is called elevator algorithm.

1.1.Character devices

Character special files or character devices provide unbuffered, direct access to the hardware device. They do not necessarily allow programs to read or write single characters at a time; that is up to the device in question. The character device for a hard disk, for example, will normally require that all reads and writes are aligned to block boundaries and most certainly will not allow reading a single byte.

Character devices are sometimes known as raw devices to avoid the confusion surrounding the fact that a character device for a piece of block-based hardware will typically require programs to read and write aligned blocks.

1.2.Block devices

Block special files or block devices provide buffered access to hardware devices, and provide some abstraction from their specifics.[5] Unlike character devices, block devices will always allow the programmer to read or write a block of any size (including single characters/bytes) and any alignment. The downside is that because block devices are buffered, the programmer does not know how long it will take before written data is passed from the kernel’s buffers to the actual device, or indeed in what order two separate writes will arrive at the physical device. Additionally, if the same hardware exposes both character and block devices, there is a risk of data corruption due to clients using the character device being unaware of changes made in the buffers of the block device.

Most systems create both block and character devices to represent hardware like hard disks. FreeBSD and Linux notably do not; the former has removed support for block devices,[6] while the latter creates only block devices. In Linux, to get a character device for a disk one must use the “raw” driver, though one can get the same effect as opening a character device by opening the block device with the Linux-specific O_DIRECT flag. 参考位置

1.3.块设备 VS 字符设备
在这里插入图片描述

2.相关属性

扇区(Sectors)：任何块设备硬件对数据处理的基本单位。通常，1个扇区的大小为512byte。（对设备而言）
- small block
- depends on the hardware
- core expects 512 bytes large sectors
- if the device uses a different size, the kernel partially adjusts (problems)
- the driver must properly modify the number passed by the kernel.
块 (Blocks)：由Linux制定对内核或文件系统等数据处理的基本单位。通常，1个块由1个或多个扇区组成。（对Linux操作系统而言）
- fixed-size chunk of data
- often 4096 bytes
段(Segments)：由若干个相邻的块组成。是Linux内存管理机制中一个内存页或者内存页的一部分。

block:
fixed-size chunk of data,
often 4096 bytes,
sector:

3.Block framework
在这里插入图片描述

Let us suppose, for instance, that a process issued a read() system call on some disk file. Here is what the kernel typically does to service the process request:

1.The service routine of the read() system call activates a suitable VFS function, passing to it a file descriptor and an offset inside the file.
2.The VFS function determines if the requested data is already available and, if necessary, how to perform the read operation.
3.Let’s assume that the kernel must read the data from the block device, thus it must determine the physical location of that data. To do this, the kernel relies on the mapping layer, which typically executes two steps:
- It determines the block size of the filesystem including the file and computes the extent of the requested data in terms of file block numbers. Essentially, the file is seen as split in many blocks, and the kernel determines the numbers (indices relative to the beginning of file) of the blocks containing the requested data.
- Next, the mapping layer invokes a filesystem-specific function that accesses the file’s disk inode and determines the position of the requested data on disk in terms of logical block numbers. Essentially, the disk is seen as split in blocks, and the kernel determines the numbers (indices relative to the beginning of the disk or partition) corresponding to the blocks storing the requested data. Because a file may be stored in nonadjacent blocks on disk, a data structure stored in the disk inode maps each file block number to a logical block number.
4.The kernel can now issue the read operation on the block device. It makes use of the generic block layer, which starts the I/O operations that transfer the requested data. In general, each I/O operation involves a group of blocks that are adjacent on disk. Because the requested data is not necessarily adjacent on disk, the generic block layer might start several I/O operations. Each I/O operation is represented by a “block I/O” (in short, “bio”) structure, which collects all information needed by the lower components to satisfy the request.
The generic block layer hides the peculiarities of each hardware block device, thus offering an abstract view of the block devices. Because almost all block devices are disks, the generic block layer also provides some general data structures that describe “disks” and “disk partitions.”
5.Below the generic block layer, the “I/O scheduler” sorts the pending I/O data transfer requests according to predefined kernel policies. The purpose of the scheduler is to group requests of data that lie near each other on the physical medium.
6.Finally, the block device drivers take care of the actual data transfer by sending suitable commands to the hardware interfaces of the disk controllers.

As you can see, there are many kernel components that are concerned with data stored in block devices; each of them manages the disk data using chunks of different length:

The controllers of the hardware block devices transfer data in chunks of fixed length called “sectors.” Therefore, the I/O scheduler and the block device drivers must manage sectors of data.
The Virtual Filesystem, the mapping layer, and the filesystems group the disk data in logical units called “blocks.” A block corresponds to the minimal disk storage unit inside a filesystem.
Block device drivers should be able to cope with “segments” of data: each segment is a memory page—or a portion of a memory page—including chunks of data that are physically adjacent on disk.
The disk caches work on “pages” of disk data, each of which fits in a page frame.
The generic block layer glues together all the upper and lower components, thus it knows about sectors, blocks, segments, and pages of data.

在这里插入图片描述
3.1.The Generic Block Layer [A block layer introduction part 1: the bio layer]

The generic block layer is a kernel component that handles the requests for all block devices in the system. Thanks to its functions, the kernel may easily:

Implement—with some additional effort—a “zero-copy” schema, where disk data is directly put in the User Mode address space without being copied to kernel memory first.
Manage logical volumes—such as those used by LVM(the Logical Volume Manager) and RAID (Redundant Array of Inexpensive Disks): several disk partitions, even on different block devices, can be seen as a single partition.
Exploit the advanced features of the most recent disk controllers.

3.2.IO调度

就是电梯算法。我们知道，磁盘是的读写是通过机械性的移动磁头来实现读写的，理论上磁盘设备满足块设备的随机读写的要求，但是出于节约磁盘，提高效率的考虑，我们希望当磁头处于某一个位置的时候，一起将最近需要写在附近的数据写入，而不是这写一下，那写一下然后再回来，IO调度就是将上层发下来的IO请求的顺序进行重新排序以及对多个请求进行合并，这样就可以实现上述的提高效率、节约磁盘的目的。这种解决问题的思路使用电梯算法，一个运行中的电梯，一个人20楼->1楼，另外一个人15->5楼，电梯不会先将第一个人送到1楼再去15楼接第二个人将其送到5楼，而是从20楼下来，到15楼的时候停下接人，到5楼将第二个放下，最后到达1楼，一句话，电梯算法最终服务的优先顺序并不按照按按钮的先后顺序。Linux内核中提供了下面的几种电梯算法来实现IO调度：

No-op I/O scheduler只实现了简单的FIFO的，只进行最简单的合并，比较适合基于Flash的存储
Anticipatory I/O scheduler推迟IO请求(大约几个微秒)，以期能对他们进行排序，获得更高效率
Deadline I/O scheduler试图把每次请求的延迟降到最低，同时也会对BIO重新排序，特别适用于读取较多的场合，比如数据库
CFQ I/O scheduler为系统内所有的任务分配均匀的IO带宽，提供一个公平的工作环境，在多媒体环境中，能保证音视频及时从磁盘中读取数据，是当前内核默认的调度器

Hacker_Albert

发布了161 篇原创文章 · 获赞 15 · 访问量 2万+

私信关注

linux block framework(1) - 块设备概念

猜你喜欢