1 概述
DEVFREQ: generic DVFS framework with device-specific OPPs, devfreq是来自三星的MyungJoo Ham [email protected]
一个具有OPPs的设备(Operating Performance Points)一般具有多个档位的频率和电压集合。因而系统就会面临需要从这些可调节档位中选择一个的问题。为了在降低功耗(通过降低频率和电压)的同时而不过于影响性能,就需要DVFS。
针对non-CPU设备的DVFS - DEVFREQ,呈现方式和/deivers/cpufreq 非常近似。然而CPUFREQ的驱动并不允许多个设备来注册,而且也不适合多个异质的设备具有不同的governor。
通常,DVFS通过设备的需求来控制频率,然后再选择对应的电压。这个需求的来源就是governor,比如CPU的 interctive、schedutil等。DEVFREQ 也是如此,每个设备有自己对应的governor,DEVFREQ就会根据governor的建议来控制目标频率以及电压。然后通过设备驱动的”target” 回调把OPP的value传递过去,最终实现频率和电压的调节。
DEVFREQ framework的好处就是规范化了device做DVFS的整套流程,也标准化了user space的控制接口。
2 标准化接口
相关的sys节点的介绍,kernel/Documentation/ABI/testing/sysfs-class-devfreq
常见的有, 位于/sys/class/devfreq/xxxx-device/
- min_freq
- max_freq
- governor
- target_freq
- available_governors
- available_frequencies
3 DEVFREQ 驱动
驱动代码位于kernel/drivers/devfreq.c
INIT
init阶段首先在/sys/class/路径下创建了devfreq
目录,这个路径下会包含所有注册进来的device
static int __init devfreq_init(void)
{
devfreq_class = class_create(THIS_MODULE, "devfreq");
if (IS_ERR(devfreq_class)) {
pr_err("%s: couldn't create class\n", __FILE__);
return PTR_ERR(devfreq_class);
}
devfreq_wq = create_freezable_workqueue("devfreq_wq");
if (!devfreq_wq) {
class_destroy(devfreq_class);
pr_err("%s: couldn't create workqueue\n", __FILE__);
return -ENOMEM;
}
devfreq_class->dev_groups = devfreq_groups;
return 0;
}
subsys_initcall(devfreq_init);
sysfs 节点
这些节点在deivce add进来之后,就会在其目录下生成
static struct attribute *devfreq_attrs[] = {
&dev_attr_governor.attr,
&dev_attr_available_governors.attr,
&dev_attr_cur_freq.attr,
&dev_attr_available_frequencies.attr,
&dev_attr_target_freq.attr,
&dev_attr_polling_interval.attr,
&dev_attr_min_freq.attr,
&dev_attr_max_freq.attr,
&dev_attr_trans_stat.attr,
NULL,
};
ATTRIBUTE_GROUPS(devfreq);
典型工作流程
devfreq本身是framework性质的,所以核心部分还是如何和device、governor交互
device 如何添加到devfreq framework
通过devfreq驱动的 devfreq_add_device()方法/** * devfreq_add_device() - Add devfreq feature to the device * @dev: the device to add devfreq feature. * @profile: device-specific profile to run devfreq. * @governor_name: name of the policy to choose frequency. * @data: private data for the governor. The devfreq framework does not * touch this value. */ struct devfreq *devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile, const char *governor_name, void *data) { struct devfreq *devfreq; struct devfreq_governor *governor; /****removed some***/ // 1. 给devfreq结构体分配内存 devfreq = kzalloc(sizeof(struct devfreq), GFP_KERNEL); // 2. 初始化devfreq结构体 devfreq->dev.parent = dev; devfreq->dev.class = devfreq_class; devfreq->dev.release = devfreq_dev_release; devfreq->profile = profile; strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN); devfreq->previous_freq = profile->initial_freq; devfreq->last_status.current_frequency = profile->initial_freq; devfreq->data = data; devfreq->nb.notifier_call = devfreq_notifier_call; // 3. 设置freq table if (!devfreq->profile->max_state && !devfreq->profile->freq_table) { mutex_unlock(&devfreq->lock); // 这个实现是从device的opp table里面拿到各个档位的频率,并填充到devfreq->profile->freq_table数组里面去 devfreq_set_freq_table(devfreq); mutex_lock(&devfreq->lock); } // 4. 设置min_freq 和 max_freq // 这个实现是从freq_table里面拿到min和max值,分别填充到 devfreq->min_freq 和 max_freq devfreq_set_freq_limits(devfreq); // 5. 设置设备名,然后注册,也就是会在/sys/class/devfreq/路径下生成对应名字的目录 dev_set_name(&devfreq->dev, "%s", dev_name(dev)); err = device_register(&devfreq->dev); /***removed some ***/ // 6. 把设备添加到devfreq_list 链表 list_add(&devfreq->node, &devfreq_list); // 7. 找到add方法里面传递过来的governor governor = find_devfreq_governor(devfreq->governor_name); /***removed some ***/ devfreq->governor = governor; err = devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_START, NULL); /***removed some ***/ } EXPORT_SYMBOL(devfreq_add_device);
governor 如何添加到devfreq framework
通过devfreq驱动的 devfreq_add_governor() 方法/** * devfreq_add_governor() - Add devfreq governor * @governor: the devfreq governor to be added */ int devfreq_add_governor(struct devfreq_governor *governor) { struct devfreq_governor *g; struct devfreq *devfreq; // 1. 把add进来的 devfreq_governor 添加到devfreq_governor_list 链表 g = find_devfreq_governor(governor->name); list_add(&governor->node, &devfreq_governor_list); // 2. 遍历 devfreq_list,关联到对应的 device list_for_each_entry(devfreq, &devfreq_list, node) { int ret = 0; struct device *dev = devfreq->dev.parent; if (!strncmp(devfreq->governor_name, governor->name, DEVFREQ_NAME_LEN)) { /* The following should never occur */ if (devfreq->governor) { dev_warn(dev, "%s: Governor %s already present\n", __func__, devfreq->governor->name); ret = devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_STOP, NULL); if (ret) { dev_warn(dev, "%s: Governor %s stop = %d\n", __func__, devfreq->governor->name, ret); } /* Fall through */ } devfreq->governor = governor; ret = devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_START, NULL); } } return err; } EXPORT_SYMBOL(devfreq_add_governor);
从governor获取目标频率,设置目标频率到device,update_devfreq()
/**
* update_devfreq() - Reevaluate the device and configure frequency.
* @devfreq: the devfreq instance.
*
* Note: Lock devfreq->lock before calling update_devfreq
* This function is exported for governors.
*/
int update_devfreq(struct devfreq *devfreq)
{
...
/* Reevaluate the proper frequency */
err = devfreq->governor->get_target_freq(devfreq, &freq);
...
err = devfreq->profile->target(devfreq->dev.parent, &freq, flags);
...
}
4 Device
现在介绍一个最简单的device的实现 devfreq_simple_dev
首先 dts文件定义了一个名为 devfreq-simple-dev的设备
```
Devfreq simple device
devfreq-simple-dev is a device that represents a simple device that cannot do any status reporting and uses a clock that can be scaled by one of more devfreq governors. It provides a list of usable frequencies for the device and some additional optional parameters.
Required properties:
- compatible: Must be "devfreq-simple-dev"
- clock-names: Must be "devfreq_clk"
- clocks: Must refer to the clock that's fed to the device. Optional properties:
- polling-ms: Polling interval for the device in milliseconds. Default: 50
- governor: Initial governor to user for the device. Default: "performance"
- qcom,prepare-clk: Prepare the device clock during initialization.
- freq-tbl-khz: A list of usable frequencies (in kHz) for the device clock.
Example:
qcom,cache { compatible = "devfreq-simple-dev";
clock-names = "devfreq_clk";
clocks = <&clock_krait clk_l2_clk>;
polling-ms = 50;
governor = "cpufreq";
freq-tbl-khz = < 300000 >,
< 345600 >,
< 422400 >,
< 499200 >,
< 576000 >,
< 652800 >,
< 729600 >,
< 806400 >,
< 883200 >,
< 960000 >,
< 1036800 >,
< 1113600 >,
< 1190400 >,
< 1267200 >,
< 1344000 >,
< 1420800 >,
< 1497600 >,
< 1574400 >,
< 1651200 >,
< 1728000 >; };
```
其次devfreq_simple_dev.c
驱动
- 注册 devfreq device, 直接看probe方法
static int devfreq_clock_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
struct dev_data *d;
struct devfreq_dev_profile *p;
u32 poll;
const char *gov_name;
int ret;
d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL);
if (!d)
return -ENOMEM;
platform_set_drvdata(pdev, d);
d->clk = devm_clk_get(dev, "devfreq_clk");
if (IS_ERR(d->clk))
return PTR_ERR(d->clk);
//1. 解析前面dts的freq-table
ret = parse_freq_table(dev, d);
if (ret)
return ret;
p = &d->profile;
//2. devfreq_dev_profile的target方法是调频的最终实作,在update_devfreq中会被调用到
p->target = dev_target;
p->get_cur_freq = dev_get_cur_freq;
ret = dev_get_cur_freq(dev, &p->initial_freq);
if (ret)
return ret;
p->polling_ms = 50;
if (!of_property_read_u32(dev->of_node, "polling-ms", &poll))
p->polling_ms = poll;
if (of_property_read_string(dev->of_node, "governor", &gov_name))
gov_name = "performance";
if (of_property_read_bool(dev->of_node, "qcom,prepare-clk")) {
ret = clk_prepare(d->clk);
if (ret)
return ret;
}
//3. 添加device到devfreq framework的device list
d->df = devfreq_add_device(dev, p, gov_name, NULL);
if (IS_ERR(d->df)) {
ret = PTR_ERR(d->df);
goto add_err;
}
return 0;
add_err:
if (of_property_read_bool(dev->of_node, "qcom,prepare-clk"))
clk_unprepare(d->clk);
return ret;
}
- 调频
static int dev_target(struct device *dev, unsigned long *freq, u32 flags)
{
struct dev_data *d = dev_get_drvdata(dev);
unsigned long rfreq;
find_freq(&d->profile, freq, flags);
rfreq = clk_round_rate(d->clk, d->freq_in_khz ? *freq * 1000 : *freq);
if (IS_ERR_VALUE(rfreq)) {
dev_err(dev, "devfreq: Cannot find matching frequency for %lu\n",
*freq);
return rfreq;
}
return clk_set_rate(d->clk, rfreq);
}
5 Governor
devfreq_governor 结构体如下
/**
* struct devfreq_governor - Devfreq policy governor
* @node: list node - contains registered devfreq governors
* @name: Governor's name
* @immutable: Immutable flag for governor. If the value is 1,
* this govenror is never changeable to other governor.
* @get_target_freq: Returns desired operating frequency for the device.
* Basically, get_target_freq will run
* devfreq_dev_profile.get_dev_status() to get the
* status of the device (load = busy_time / total_time).
* If no_central_polling is set, this callback is called
* only with update_devfreq() notified by OPP.
* @event_handler: Callback for devfreq core framework to notify events
* to governors. Events include per device governor
* init and exit, opp changes out of devfreq, suspend
* and resume of per device devfreq during device idle.
*
* Note that the callbacks are called with devfreq->lock locked by devfreq.
*/
struct devfreq_governor {
struct list_head node;
const char name[DEVFREQ_NAME_LEN];
const unsigned int immutable;
int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
int (*event_handler)(struct devfreq *devfreq,
unsigned int event, void *data);
};
最为简单的的governor 是userspace
位于 kernel/drivers/devfreq/governor_userspace.c 一共100多行
userspace 的governor方法就是在 /sys/class/devfreq/xxxx-device/ 路径下建立userspace的目录,提供了唯一一个节点叫set_freq,用户可以把想要的频率(从available_frequencies获取)
首先通过devfreq framework的devfreq_add_governor() 方法添加governor
static int __init devfreq_userspace_init(void)
{
return devfreq_add_governor(&devfreq_userspace);
}
如果切到userspace的话,就在device路径下创建userspace,并生成set_freq节点,可以读写
频率通过update_devfreq 更新到devfreq框架static ssize_t store_freq(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct devfreq *devfreq = to_devfreq(dev); struct userspace_data *data; unsigned long wanted; int err = 0; mutex_lock(&devfreq->lock); data = devfreq->data; sscanf(buf, "%lu", &wanted); data->user_frequency = wanted; data->valid = true; //devfreq framework 提供的调用 err = update_devfreq(devfreq); if (err == 0) err = count; mutex_unlock(&devfreq->lock); return err; } static ssize_t show_freq(struct device *dev, struct device_attribute *attr, char *buf) { struct devfreq *devfreq = to_devfreq(dev); struct userspace_data *data; int err = 0; mutex_lock(&devfreq->lock); data = devfreq->data; if (data->valid) err = sprintf(buf, "%lu\n", data->user_frequency); else err = sprintf(buf, "undefined\n"); mutex_unlock(&devfreq->lock); return err; } static DEVICE_ATTR(set_freq, 0644, show_freq, store_freq); static struct attribute *dev_entries[] = { &dev_attr_set_freq.attr, NULL, }; static struct attribute_group dev_attr_group = { .name = "userspace", .attrs = dev_entries, }; static int userspace_init(struct devfreq *devfreq) { int err = 0; struct userspace_data *data = kzalloc(sizeof(struct userspace_data), GFP_KERNEL); if (!data) { err = -ENOMEM; goto out; } data->valid = false; devfreq->data = data; err = sysfs_create_group(&devfreq->dev.kobj, &dev_attr_group); out: return err; }
update_devfreq
从上面的store_freq可以看到,首先是把用户空间写的频率存在userspace_data 的 user_frequency里面
之后再调用 update_devfreq(devfreq); 看起来并不是直接把频率就通过devfreq参数传递过去了,显然不可能update_devfreq 这样通用的方法会去具体到解析user_frequency,继续看
static ssize_t store_freq(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct devfreq *devfreq = to_devfreq(dev);
struct userspace_data *data;
unsigned long wanted;
int err = 0;
mutex_lock(&devfreq->lock);
data = devfreq->data;
sscanf(buf, "%lu", &wanted);
data->user_frequency = wanted;
data->valid = true;
err = update_devfreq(devfreq);
if (err == 0)
err = count;
mutex_unlock(&devfreq->lock);
return err;
}
再来看update_devfreq的实现, 明了了,是通过callback, 每一个governor都需要实现get_target_freq 方法
/* Reevaluate the proper frequency */
err = devfreq->governor->get_target_freq(devfreq, &freq);
get_target_freq
static struct devfreq_governor devfreq_userspace = {
.name = "userspace",
.get_target_freq = devfreq_userspace_func,
.event_handler = devfreq_userspace_handler,
};
这里就是前面提到的用户设置的频率的处理过程,首先设置的频率当然要处在这个device锁允许的min和max之间,如果超出max,那目标频率就是max,如果低于min,那目标频率就是min。如果是正常范围,就是目标频率为准
static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
{
struct userspace_data *data = df->data;
if (data->valid) {
unsigned long adjusted_freq = data->user_frequency;
if (df->max_freq && adjusted_freq > df->max_freq)
adjusted_freq = df->max_freq;
if (df->min_freq && adjusted_freq < df->min_freq)
adjusted_freq = df->min_freq;
*freq = adjusted_freq;
} else {
*freq = df->previous_freq; /* No user freq specified yet */
}
return 0;
}
6 综述
devfreq framework
- 提供了device和governor的注册方法,添加到devfreq的device和governor会被分别放在deivce和governor的链表中
- 提供update_devfreq方法,供governor决定调频的时候来调用, 从governor获得目标频率,并把目标频率通过target方法让device去调节
device
- 需要实现频率的查询get_cur_freq() 回调
- 调频target() 回调
governor
- 需要实现目标频率get_target_freq() 回调
- 制定策略,在确定要调频的时候调用update_devfreq()