版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Linux I/O Block-递交I/O请求分类:Linux驱动程序2012-12-09 20:421983人阅读评论(1)收藏举报 在通用块层中,bio用来描述单一的I/O请求,它记录了一次I/O操作所必需的相关信息,如用于I/O操作的数据缓存位置,I/O操作的块设备起始扇区,是读操作还是写操作等等。struct bio的定义如下struct bio sector_tbi_sector;/* device address in 512 byte sectors */struct bio*bi_next;/* request queue link */struct block_device*
2、bi_bdev;unsigned longbi_flags;/* status, command, etc */unsigned longbi_rw;/* bottom bits READ/WRITE, * top bits priority */unsigned shortbi_vcnt;/* how many bio_vecs */unsigned shortbi_idx;/* current index into bvl_vec */* Number of segments in this BIO after * physical address coalescing is perfor
3、med. */unsigned intbi_phys_segments;unsigned intbi_size;/* residual I/O count */* * To keep track of the max segment size, we account for the * sizes of the first and last mergeable segments in this bio. */unsigned intbi_seg_front_size;unsigned intbi_seg_back_size;unsigned intbi_max_vecs;/* max bvl_
4、vecs we can hold */unsigned intbi_comp_cpu;/* completion CPU */atomic_tbi_cnt;/* pin count */struct bio_vec*bi_io_vec;/* the actual vec list */bio_end_io_t*bi_end_io;void*bi_private;#if defined(CONFIG_BLK_DEV_INTEGRITY)struct bio_integrity_payload *bi_integrity; /* data integrity */#endifbio_destruc
5、tor_t*bi_destructor;/* destructor */* * We can inline a number of vecs at the end of the bio, to avoid * double allocations for a small number of bio_vecs. This member * MUST obviously be kept at the very end of the bio. */struct bio_vecbi_inline_vecs0;bi_sector:该I/O操作的起始扇区号bi_rw:指明了读写方向bi_vcnt:该I/O
6、操作中涉及到了多少个缓存向量,每个缓存向量由page,offset,len来描述bi_idx:指示当前的缓存向量bi_io_vec:缓存向量数组缓存向量的定义:struct bio_vec struct page*bv_page;unsigned intbv_len;unsigned intbv_offset;struct request用于描述提交给块设备的I/O请求,bio会动态地添加进request,因此一个request往往会包含若干相邻的bio。struct request struct list_head queuelist;struct call_single_data csd;
7、int cpu;struct request_queue *q;unsigned int cmd_flags;enum rq_cmd_type_bits cmd_type;unsigned long atomic_flags;/* the following two fields are internal, NEVER access directly */sector_t _sector;/* sector cursor */unsigned int _data_len;/* total data len */struct bio *bio;struct bio *biotail;struct
8、 hlist_node hash;/* merge hash */* * The rb_node is only used inside the io scheduler, requests * are pruned when moved to the dispatch queue. So let the * completion_data share space with the rb_node. */union struct rb_node rb_node;/* sort/lookup */void *completion_data;/* * two pointers are availa
9、ble for the IO schedulers, if they need * more they have to dynamically allocate it. */void *elevator_private;void *elevator_private2;struct gendisk *rq_disk;unsigned long start_time;/* Number of scatter-gather DMA addr+len pairs after * physical address coalescing is performed. */unsigned short nr_
10、phys_segments;unsigned short ioprio;void *special;/* opaque pointer available for LLD use */char *buffer;/* kaddr of the current segment if available */int tag;int errors;int ref_count;/* * when request is used as a packet command carrier */unsigned short cmd_len;unsigned char _cmdBLK_MAX_CDB;unsign
11、ed char *cmd;unsigned int extra_len;/* length of alignment and padding */unsigned int sense_len;unsigned int resid_len;/* residual count */void *sense;unsigned long deadline;struct list_head timeout_list;unsigned int timeout;int retries;/* * completion callback. */rq_end_io_fn *end_io;void *end_io_d
12、ata;/* for bidi */struct request *next_rq;queuelist:用于将request链入请求队列的链表元素q:指向所属的请求队列_sector:下一个要传输的bio的起始扇区号_data_len:request要传输的数据字节数bio,biotail:用于维护request中的bio链表在之前介绍的gendisk结构中,可以看到每个块设备(或分区)都对应了一个request_queue的结构,该结构用来容纳request,并且包含了相应的递交request以及I/O调度的方法递交一个bio的主要工作是从generic_make_request()函数开始
13、的,我们以此为入口来分析一个bio的递交过程。在每个进程的task_struct中,都包含有两个变量-struct bio *bio_list, *bio_tail,generic_make_request()的主要工作就是用这两个变量来维护当前待添加的bio链表,实际的提交操作会由generic_make_request()调用_generic_make_request()函数完成。而在_generic_make_request()中,会调用到queue_list中定义的make_request_fn函数,也就是特定于设备的提交请求函数来完成后续的工作。在这里便会有一些问题,大部分设备的ma
14、ke_request_fn都可以直接定义为内核实现的_make_request函数,而一些设备需要使用自己的make_request_fn,而自行实现的make_request_fn有可能会递归调用gerneric_make_request(),由于内核的堆栈十分有限,因此在generic_make_request()的实现中,玩了一些小把戏,使得递归的深度不会超过一层。我们注意到bio_tail是一个二级指针,这个值最初是NULL,当有bio添加进来,bio_tail将会指向bio-bi_next(如果bio全都递交上去了,则bio_tail将会指向bio_list),也就是说除了第一次调用
15、外,其他每次递归调用generic_make_request()函数都会出现bio_tail不为NULL的情形,因此当bio_tail不为NULL时,则只将bio添加到由bio_list和bio_tail维护的链表中,然后直接返回,而不调用_generic_make_request(),这样便防止了多重递归的产生void generic_make_request(struct bio *bio)if (current-bio_tail) /current-bio_tail不为空则表明有bio正在提交,也就是说是处于递归调用/* make_request is active */bio-bi_n
16、ext = NULL;/*这里current-tail有两种情况,当current的bio链表为空时,bio_tail指向的是bio_list当current的bio链表不为空时,bio_tail指向的是最后一个bio的bi_next指针,因此这句的实际作用就是将bio添加到了current的bio链表的尾部*/*(current-bio_tail) = bio;current-bio_tail = &bio-bi_next;/*这里直接返回,遍历并且提交bio的工作永远都是交给最先调用的generic_make_request来处理的,避免了多重递归*/return;/* following
17、 loop may be a bit non-obvious, and so deserves some * explanation. * Before entering the loop, bio-bi_next is NULL (as all callers * ensure that) so we have a list with a single bio. * We pretend that we have just taken it off a longer list, so * we assign bio_list to the next (which is NULL) and b
18、io_tail * to &bio_list, thus initialising the bio_list of new bios to be * added. _generic_make_request may indeed add some more bios * through a recursive call to generic_make_request. If it * did, we find a non-NULL value in bio_list and re-enter the loop * from the top. In this case we really did
19、 just take the bio * of the top of the list (no pretending) and so fixup bio_list and * bio_tail or bi_next, and call into _generic_make_request again. * * The loop was structured like this to make only one call to * _generic_make_request (which is important as it is large and * inlined) and to keep
20、 the structure simple. */BUG_ON(bio-bi_next);do current-bio_list = bio-bi_next;/这里取current的待提交bio链表的下一个bioif (bio-bi_next = NULL)/bi_next为空,也就是说待提交链表已经空了,只剩下最后一个bio了current-bio_tail = t-bio_list;/bio_tail指向bio_listelsebio-bi_next = NULL;/否则将bio提取出来_generic_make_request(bio);/提交biobio = current-bio_l
21、ist;/取新的待提交bio while (bio);current-bio_tail = NULL; /* deactivate */_generic_make_request()首先由bio对应的block_device获取等待队列q,然后要检查对应的设备是不是分区,如果是分区的话要将扇区地址进行重新计算,最后调用make_request_fn完成bio的递交static inline void _generic_make_request(struct bio *bio)struct request_queue *q;sector_t old_sector;int ret, nr_sect
22、ors = bio_sectors(bio);/提取bio的大小,以扇区为单位dev_t old_dev;int err = -EIO;might_sleep();/这里检查bio的传输起始扇区是否超过设备的最大扇区,并且两者之间的差不能小于nr_sectorif (bio_check_eod(bio, nr_sectors)goto end_io;/* * Resolve the mapping until finished. (drivers are * still free to implement/resolve their own stacking * by explicitly r
23、eturning 0) * * NOTE: we dont repeat the blk_size check for each new device. * Stacking drivers are expected to know what they are doing. */old_sector = -1;old_dev = 0;do char bBDEVNAME_SIZE;q = bdev_get_queue(bio-bi_bdev);/获取对应设备的请求队列if (unlikely(!q) printk(KERN_ERR generic_make_request: Trying to
24、access nonexistent block-device %s (%Lu)n,bdevname(bio-bi_bdev, b),(long long) bio-bi_sector);goto end_io;/*下面做一些必要的检查*/if (unlikely(!bio_rw_flagged(bio, BIO_RW_DISCARD) & nr_sectors queue_max_hw_sectors(q) printk(KERN_ERR bio too big device %s (%u %u)n, bdevname(bio-bi_bdev, b), bio_sectors(bio), q
25、ueue_max_hw_sectors(q);goto end_io;if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q-queue_flags)goto end_io;if (should_fail_request(bio)goto end_io;/* * If this device has partitions, remap block n * of partition p to block n+start(p) of the disk. */ /如果bio指定的是一个分区,则传输点要重新进行计算blk_partition_remap(bio);if (b
26、io_integrity_enabled(bio) & bio_integrity_prep(bio)goto end_io;if (old_sector != -1)trace_block_remap(q, bio, old_dev, old_sector);old_sector = bio-bi_sector;old_dev = bio-bi_bdev-bd_dev;if (bio_check_eod(bio, nr_sectors)goto end_io;if (bio_rw_flagged(bio, BIO_RW_DISCARD) & !blk_queue_discard(q) err
27、 = -EOPNOTSUPP;goto end_io;trace_block_bio_queue(q, bio);ret = q-make_request_fn(q, bio);/这里是关键,调用请求队列中的make_request_fn函数处理请求 while (ret);return;end_io:bio_endio(bio, err);辅助函数blk_partition_remap():static inline void blk_partition_remap(struct bio *bio)struct block_device *bdev = bio-bi_bdev;/*首先要保证
28、传输的大小不能小于1个扇区并且bdev确实是分区*/if (bio_sectors(bio) & bdev != bdev-bd_contains) struct hd_struct *p = bdev-bd_part;/获取分区信息bio-bi_sector += p-start_sect;/在传输起点的原基础上加上分区的起始扇区号bio-bi_bdev = bdev-bd_contains;/将bio的bdev置为主设备trace_block_remap(bdev_get_queue(bio-bi_bdev), bio, bdev-bd_dev, bio-bi_sector - p-sta
29、rt_sect);可以看到这里将bio的参考对象设置为了主设备,而不是分区,因此对应的扇区起始号也要计算为扇区的绝对值。大多数的make_request_fn函数都可以直接定义为_make_request(),我们通过这个函数来分析递交bio的关键操作static int _make_request(struct request_queue *q, struct bio *bio)struct request *req;int el_ret;unsigned int bytes = bio-bi_size;const unsigned short prio = bio_prio(bio);co
30、nst bool sync = bio_rw_flagged(bio, BIO_RW_SYNCIO);const bool unplug = bio_rw_flagged(bio, BIO_RW_UNPLUG);const unsigned int ff = bio-bi_rw & REQ_FAILFAST_MASK;int rw_flags;/*如果BIO_RW_BARRIER被置位(表示必须得让请求队列中的所有bio传递完毕才处理自己), 但是不支持hardbarrier,不能进行bio的提交*/if (bio_rw_flagged(bio, BIO_RW_BARRIER) & (q-ne
31、xt_ordered = QUEUE_ORDERED_NONE) bio_endio(bio, -EOPNOTSUPP);return 0;/* * low level driver can indicate that it wants pages above a * certain limit bounced to low memory (ie for highmem, or even * ISA dma in theory) */blk_queue_bounce(q, &bio);spin_lock_irq(q-queue_lock);/如果BIO_RW_BARRIER被置位或者请求队列为
32、空,则情况比较简单,不用进行bio的合并,跳转到get_rq处处理if (unlikely(bio_rw_flagged(bio, BIO_RW_BARRIER) | elv_queue_empty(q)goto get_rq;/*请求队列不为空*/*elv_merge()试图寻找一个已存在的request,将bio并入其中*/el_ret = elv_merge(q, &req, bio);switch (el_ret) case ELEVATOR_BACK_MERGE:BUG_ON(!rq_mergeable(req);/*相关检查*/if (!ll_back_merge_fn(q, re
33、q, bio)break;trace_block_bio_backmerge(q, bio);if (req-cmd_flags & REQ_FAILFAST_MASK) != ff)blk_rq_set_mixed_merge(req);/*这里将bio插入到request尾部*/req-biotail-bi_next = bio;req-biotail = bio;req-_data_len += bytes;req-ioprio = ioprio_best(req-ioprio, prio);if (!blk_rq_cpu_valid(req)req-cpu = bio-bi_comp_
34、cpu;drive_stat_acct(req, 0);if (!attempt_back_merge(q, req)elv_merged_request(q, req, el_ret);goto out;case ELEVATOR_FRONT_MERGE:BUG_ON(!rq_mergeable(req);if (!ll_front_merge_fn(q, req, bio)break;trace_block_bio_frontmerge(q, bio);if (req-cmd_flags & REQ_FAILFAST_MASK) != ff) blk_rq_set_mixed_merge(
35、req);req-cmd_flags &= REQ_FAILFAST_MASK;req-cmd_flags |= ff;/*这里将bio插入到request的头部*/bio-bi_next = req-bio;req-bio = bio;/* * may not be valid. if the low level driver said * it didnt need a bounce buffer then it better * not touch req-buffer either. */req-buffer = bio_data(bio);req-_sector = bio-bi_s
36、ector;req-_data_len += bytes;req-ioprio = ioprio_best(req-ioprio, prio);if (!blk_rq_cpu_valid(req)req-cpu = bio-bi_comp_cpu;drive_stat_acct(req, 0);if (!attempt_front_merge(q, req)elv_merged_request(q, req, el_ret);goto out;/* ELV_NO_MERGE: elevator says dont/cant merge. */default:;get_rq:/*下面的代码对应请
37、求队列为空的情况,需要先分配一个request,再将bio插入*/* * This sync check and mask will be re-done in init_request_from_bio(), * but we need to set it earlier to expose the sync flag to the * rq allocator and io schedulers. */rw_flags = bio_data_dir(bio);/确定读写标识if (sync)rw_flags |= REQ_RW_SYNC;/* * Grab a free request.
38、This is might sleep but can not fail. * Returns with the queue unlocked. */req = get_request_wait(q, rw_flags, bio);/分配一个新的request/* * After dropping the lock and possibly sleeping here, our request * may now be mergeable after it had proven unmergeable (above). * We dont worry about that case for efficiency. It wont happen * often, and the elevators are able to handle it. */ /根据bio初始化新分配的request,并将bio插入到request中init_request_from_bio(req, bio);spin_lock_irq(q-queue_lock);if (test_bit(QUEUE_FL
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论