黄毛什么意思| 怀孕初期吃什么食物好| 军长什么级别| 感冒吃什么菜比较好| daily什么意思| 几月初几是叫什么历| 左手尾戒什么意思| 灵长类动物是指什么| 大便溏薄是什么意思| 舌头痛吃什么药好得快| 白细胞计数偏高是什么意思| 甲子年是什么意思| 什么叫辟谷减肥法| 下作是什么意思| 临床药学在医院干什么| 什么是六爻| 酒吧营销是做什么的| 牙齿黑是什么原因| 受精卵着床的时候会有什么症状| 牛肉炖什么好吃| 眼球有黑色斑点是什么| 枸杞泡水喝有什么作用和功效| 甲状腺在什么位置图片| 吃什么健脾| ips屏幕是什么意思| 甲状腺是什么意思| 什么最解渴| 肚子痛拉肚子吃什么药| 盆腔炎吃什么药好得快| 为什么会长肥胖纹| 梦见打死蛇是什么意思| 常务副县长什么级别| 鼻子干痒是什么原因| 什么克金| 遍体鳞伤是什么意思| 脾胃不好吃什么调理| 什么时间入伏| 拉不出尿是什么原因| 内蒙古简称什么| 蒸馏水是什么水| 曹操是什么生肖| 脑垂体挂什么科| 什么叫伪娘| 颈肩综合症有什么症状| 乌鸦长什么样| 什么是慰安妇| 失恋是什么意思| 妇科检查清洁度二度是什么意思| 育婴师是干什么的| 狗可以吃什么| 老人住院送什么东西好| 晕车是什么原因引起的| 硬脂酸是什么| 荔枝有什么作用与功效| 子宫切除有什么影响| 聚酯纤维是什么料子| 为什么去香港还要通行证| 梦见怀孕是什么意思| 菊花的功效是什么| 体检前一天晚上吃什么| 阴囊潮湿用什么药| 幽门螺旋杆菌弱阳性是什么意思| 德国是什么民族| 面部痉挛是什么原因引起的| 卡不当什么意思| 冬天种什么蔬菜合适| 榴莲不能和什么一起吃| 臆想是什么意思| 七月三号是什么日子| 一个口一个塞念什么| 做梦房子倒塌什么预兆| 阴道发痒是什么原因| 吃知柏地黄丸有什么副作用| 茉莉花茶适合什么季节喝| 为什么会血糖高| 男人秒射是什么原因| 印第安老斑鸠什么意思| blood什么意思| 沙中土是什么意思| 润字五行属什么| 仙灵脾又叫什么| 牛仔裤搭配什么鞋| 卵圆孔未闭是什么意思| 心电图t波改变什么意思| 怀孕二十天有什么反应| 喝咖啡有什么好处| 呦西是什么意思| 包谷是什么意思| 女左上眼皮跳是什么预兆| 拖鞋买什么材质的好| 大同古代叫什么| 星期天为什么不叫星期七| 了了什么意思| 鹿土念什么| 色戒讲的什么| store是什么| 网球肘用什么药| 花开花落不见你回头是什么歌| 脚背痛什么原因引起的| yxh是什么意思| 东宫是什么生肖| 煮虾放什么| 给医生送锦旗写什么| 结肠炎有什么症状| 卡他症状是什么意思| 胸部疼痛挂什么科| 开业送什么| 32周做什么检查| 跑龙套是什么意思| 荷尔蒙是什么东西| 舜字五行属什么| 什么品牌镜片好| 罗字五行属什么| 唇炎看什么科室| 现在钱为什么这么难挣| 双肺纹理增强是什么意思| 复检是什么意思| 芝五行属什么| 胃寒吃什么食物好| 狗不能吃什么食物| 哪里是什么意思| 小酌怡情下一句是什么| 智商100属于什么水平| 肾气不足吃什么药好| 做爱什么姿势| clarks是什么牌子| 拜忏是什么意思| 孕妇喝什么水比较好| 浮想联翩是什么意思| 胃息肉是什么引起的| 情绪高涨是什么意思| 高血压吃什么食物| 鱼鳞云有什么预兆| mu是什么单位| 发烧酒精擦什么部位| HCG 是什么| 本田的高端品牌是什么| 血糖高适合喝什么汤| 最高法院院长什么级别| 脾虚气滞吃什么中成药| 沙茶酱做什么菜最好吃| 脑脊液白细胞高是什么原因| 同工同酬什么意思| 六月份种什么菜| 阴虱用什么药治疗| 去台湾需要什么证件| qcy是什么牌子| 什么是黄精| 产后大出血一般发生在什么时候| 红肉是什么肉| 后年是什么年| 割包皮有什么影响| 说梦话是什么原因引起的| 必有近忧是什么意思| 什么是月经不调| 阴茎中途疲软吃什么药| 打升白针有什么副作用| 肝火上炎吃什么中成药| 女人排卵是什么时候| 什么时候看到的月亮最大| 胃息肉是什么引起的| 白手起家是什么意思| 雪糕是什么做的| 白练是什么意思| 子宫脱落有什么症状| 托孤是什么意思| 小山羊是什么病| 肝异常一般是什么情况| 磷酸是什么| 一个月来两次例假是什么原因| 啰嗦是什么意思| 鸡蛋粘壳是什么原因| 脚凉吃什么药| 皮尔卡丹属于什么档次| 左眼跳什么右眼跳什么| 公招是什么意思| 空调出的水是什么水| 胖子从12楼掉下来会变什么| 开通花呗有什么风险| 怀孕吃叶酸有什么用| 吃什么容易消化| sephora是什么牌子| 肝阳上亢吃什么药| 乌龟为什么会叫| kda是什么意思| des是什么意思| 幺蛾子是什么意思| 送男教师什么礼物合适| 香港奶粉为什么限购| 欲语还休是什么意思| 按摩有什么好处和坏处| 原本是什么意思| 纵欲过度是什么意思| 抗日战争什么时候开始的| 肩膀痛是什么原因| 有什么好看的电视剧| 筋头巴脑是什么肉| 格林巴利综合症是什么| 阴阳屏是什么意思| 普洱茶有什么功效与作用| 宝宝是什么意思| 用盐水洗脸有什么好处和坏处| 百香果有什么功效与作用| 共襄盛举是什么意思| 酸入肝是什么意思| 有心火是什么症状| EPS什么意思| 生物学是什么| 裂帛是什么意思| 211是什么星座| 白芷有什么作用| 土羊是什么字| 宁静是什么意思| 脚气吃什么药| 火烧火燎是什么意思| 专班是什么意思| 自字五行属什么| 什么袍加身| 吃什么美白| 维生素b6吃多了有什么副作用| 性有什么好处和坏处| 西安有什么特色美食| 痔疮用什么药最好| 阴沟肠杆菌是什么病| 腰肌劳损看什么科| 脸部填充用什么填充最好| 右眼皮跳是什么预兆男| 百日咳是什么意思| 福禄寿的禄是什么意思| 舌头痛吃什么药好得快| 什么时候进伏| 体温低是什么原因| 一什么力量| 梦见玉碎了是什么意思| 补体c1q偏低说明什么| 小儿积食吃什么药| 食物中毒有什么症状| 什么人需要做心脏造影| 纳豆是什么东西| 无名指戴戒指什么意思| 男人为什么会得尿结石| 什么油适合炒菜| 腱鞘炎用什么药最好| 耳朵烫是什么原因| 一什么牛肉| nm是什么单位| 心脏在什么位置图片| 增强记忆力吃什么| 痛风要吃什么药好得快| press什么意思| 紫砂壶适合泡什么茶| y什么意思| 为什么糙米越吃血糖越高| 喉咙溃疡吃什么药| 肿瘤患者吃什么药可以抑制肿瘤| 菊花像什么比喻句| 什么样的镜子| 尿道感染要吃什么药才能快速治好| 吃孕酮片有什么副作用| 做馒头用什么面粉好| 不服是什么意思| 阴道炎用什么药好| 古代四大发明是什么| 破产是什么意思| 百度

张惠妹巡演深圳站即将开启 回顾20年经典歌曲

From Linux Raid Wiki
Jump to: navigation, search

OBSOLETE CONTENT

百度 ========================================================商务合作(BD)岗位职责:1、负责APP产品的线上、线下推广工作,完成下载量、安装量等推广目标;2、配合合作渠道进行运营推广及上线发布跟进,负责口碑营销,包括但不限于微信、微博和论坛等推广方式,灵活推广公司的APP产品;3、推广渠道数据监控与反馈跟踪,对推广数据进行分析,有针对性地调整推广策略;4、维护和拓展各大应用市场首发换量等资源;5、管理维护客户关系以及客户间的长期战略合作计划。

This wiki has been archived and the content is no longer updated.
For latest Linux RAID documentation, see Linux Docs.

Back to Hardware issues Forward to Detecting, querying and testing

Contents

RAID setup

General setup

This is what you need for any of the RAID levels:

  • A kernel with the appropriate md support either as modules or built-in. Preferably a kernel from the 4.x series. Although most of this should work fine with later 3.x kernels, too.
  • The mdadm tool
  • Patience, Pizza, and your favorite caffeinated beverage.

The first two items are included as standard in most GNU/Linux distributions today.

If your system has RAID support, you should have a file called /proc/mdstat. Remember it, that file is your friend. If you do not have that file, maybe your kernel does not have RAID support.

If you're sure your kernel has RAID support you may need to run run modprobe raid[RAID mode] to load raid support into your kernel. eg to support raid5:

modprobe raid456

See what the file contains, by doing a

cat /proc/mdstat

It should tell you that you have the right RAID personality (eg. RAID mode) registered, and that no RAID devices are currently active. See the /proc/mdstat page for more details.

Preparing and partitioning your disk devices

Arrays can be built on top of entire disks or on partitions.

This leads to 2 frequent questions:

  • Should I use entire device or a partition?
  • What partition type?

Which are discussed in Partition Types

Downloading and installing mdadm - the RAID management tool

mdadm is now the standard RAID management tool and should be found in any modern distribution.

You can retrieve the most recent version of mdadm with

git clone git://neil.brown.name/mdadm

In the absence of any other preferences, do that in the /usr/local/src directory. As a linux-specific program there is none of this autoconf stuff - just follow the instructions as per the INSTALL file.

Alternatively just use the normal distribution method for obtaining the package:

Debian, Ubuntu:

 apt-get install mdadm

Gentoo:

 emerge mdadm

RedHat:

 yum install mdadm

[open]SUSE:

 zypper in mdadm

Mdadm modes of operation

mdadm is well documented in its manpage - well worth a read.

   man mdadm

mdadm has 7 major modes of operation. Normal operation just uses the 'Create', 'Assemble' and 'Monitor' commands - the rest come in handy when you're messing with your array; typically fixing it or changing it.

1. Create

Create a new array with per-device superblocks (normal creation).

2. Assemble

Assemble the parts of a previously created array into an active array. Components can be explicitly given or can be searched for. mdadm checks that the components do form a bona fide array, and can, on request, fiddle superblock information so as to assemble a faulty array. Typically you do this in the init scripts after rebooting.

3. Follow or Monitor

Monitor one or more md devices and act on any state changes. This is only meaningful for raid1, 4, 5, 6, 10 or multipath arrays as only these have interesting state. raid0 or linear never have missing, spare, or failed drives, so there is nothing to monitor. Typically you do this after rebooting too.

4. Build

Build an array that doesn't have per-device superblocks. For these sorts of arrays, mdadm cannot differentiate between initial creation and subsequent assembly of an array. It also cannot perform any checks that appropriate devices have been requested. Because of this, the Build mode should only be used together with a complete understanding of what you are doing.

5. Grow

Grow, shrink or otherwise reshape an array in some way. Currently supported growth options including changing the active size of component devices in RAID level 1/4/5/6 and changing the number of active devices in RAID1.

6. Manage

This is for doing things to specific components of an array such as adding new spares and removing faulty devices.

7. Misc

This is an 'everything else' mode that supports operations on active arrays, operations on component devices such as erasing old superblocks, and information gathering operations.


Create RAID device

Below we'll see how to create arrays of various types; the basic approach is:

   mdadm --create /dev/md0 <blah>
   mdadm --monitor /dev/md0

If you want to access all the latest and upcoming features such as fully named RAID arrays so you no longer have to memorize which partition goes where, you'll want to make sure to use persistent metadata in the version 1.0 or higher format, as there is no way (currently or planned) to convert an array to a different metadata version. Current recommendations are to use metadata version 1.2 except when creating a boot partition, in which case use version 1.0 metadata and RAID-1.[1]

Booting from a 1.2 raid is only supported when booting with an initramfs, as the kernel can no longer assemble or recognise an array - it relies on userspace tools. Booting directly from 1.0 is supported because the metadata is at the end of the array, and the start of a mirrored 1.0 array just looks like a normal partition to the kernel.

NOTE: A work-around to upgrade metadata from version 0.90 to 1.0 is contained in the section RAID superblock formats.

To change the metadata version (the default is now version 1.2 metadata) add the --metadata option after the switch stating what you're doing in the first place. This will work:

   mdadm --create /dev/md0 --metadata 1.0 <blah>

This, however, will not work:

   mdadm --metadata 1.0 --create /dev/md0 <blah>

Linear mode

Ok, so you have two or more partitions which are not necessarily the same size (but of course can be), which you want to append to each other.

Spare-disks are not supported here. If a disk dies, the array dies with it. There's no information to put on a spare disk.

Using mdadm, a single command like

    mdadm --create --verbose /dev/md0 --level=linear --raid-devices=2 /dev/sdb6 /dev/sdc5

should create the array. The parameters talk for themselves. The out- put might look like this

   mdadm: chunk size defaults to 64K
   mdadm: array /dev/md0 started.

Have a look in /proc/mdstat. You should see that the array is running.

Now, you can create a filesystem, just like you would on any other device, mount it, include it in your /etc/fstab and so on.

RAID-0

You have two or more devices, of approximately the same size, and you want to combine their storage capacity and also combine their performance by accessing them in parallel.

    mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=2 /dev/sdb6 /dev/sdc5

Like in Linear mode, spare disks are not supported here either. RAID-0 has no redundancy, so when a disk dies, the array goes with it.

Having run mdadm you have initialised the superblocks and started the raid device. Have a look in /proc/mdstat to see what's going on. You should see that your device is now running.

/dev/md0 is now ready to be formatted, mounted, used and abused.

RAID-1

You have two devices of approximately same size, and you want the two to be mirrors of each other. Eventually you have more devices, which you want to keep as stand-by spare-disks, that will automatically become a part of the mirror if one of the active devices break.

    mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1

If you have spare disks, you can add them to the end of the device specification like

    mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1 --spare-devices=1 /dev/sdd1

Ok, now we're all set to start initializing the RAID. The mirror must be constructed, eg. the contents (however unimportant now, since the device is still not formatted) of the two devices must be synchronized.

Check out the /proc/mdstat file. It should tell you that the /dev/md0 device has been started, that the mirror is being reconstructed, and an ETA of the completion of the reconstruction.

Reconstruction is done using idle I/O bandwidth. So, your system should still be fairly responsive, although your disk LEDs should be glowing nicely.

The reconstruction process is transparent, so you can actually use the device even though the mirror is currently under reconstruction.

Try formatting the device, while the reconstruction is running. It will work. Also you can mount it and use it while reconstruction is running. Of Course, if the wrong disk breaks while the reconstruction is running, you're out of luck.

RAID-4/5/6

You have three or more devices (four or more for RAID-6) of roughly the same size, you want to combine them into a larger device, but still to maintain a degree of redundancy for data safety. Eventually you have a number of devices to use as spare-disks, that will not take part in the array before another device fails.

If you use N devices where the smallest has size S, the size of the entire raid-5 array will be (N-1)*S, or (N-2)*S for raid-6. This "missing" space is used for parity (redundancy) information. Thus, if any disk fails, all the data stays intact. But if two disks fail on raid-5, or three on raid-6, all data is lost.

The default chunk-size is 128kb. That's the default io size on a spindle.

Ok, enough talking. Let's see if raid-5 works. Run your command:

    mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 --spare-devices=1 /dev/sde1

and see what happens. Hopefully your disks start working like mad, as they begin the reconstruction of your array. Have a look in /proc/mdstat to see what's going on.

If the device was successfully created, the reconstruction process has now begun. Your array is not consistent until this reconstruction phase has completed. However, the array is fully functional (except for the handling of device failures of course), and you can format it and use it even while it is reconstructing.

The initial reconstruction will always appear as though the array is degraded and is being reconstructed onto a spare, even if only just enough devices were added with zero spares. This is to optimize the initial reconstruction process. This may be confusing or worrying; it is intended for good reason. For more information, please check this source, directly from Neil Brown.

Now, you can create a filesystem. See the section on special options to mke2fs before formatting the filesystem. You can now mount it, include it in your /etc/fstab and so on.

Saving your RAID configuration (2011)

After you've created your array, it's important to save the configuration in the proper mdadm configuration file. In Ubuntu, this is file /etc/mdadm/mdadm.conf. In some other distributions, this is file /etc/mdadm.conf. Check your distribution's documentation, or look at man mdadm.conf, to see what applies to your distribution.

To save the configuration information:

Ubuntu:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

Others (check your distribution's documentation):

mdadm --detail --scan >> /etc/mdadm.conf

Note carefully that if you do this before your array has finished initialization, you may have an inaccurate spares= clause.

In Ubuntu, if you neglect to save the RAID creation information, you will get peculiar errors when you try to assemble the RAID device (described below). There will be errors generated that the hard drive is busy, even though it seems to be unused. For example, the error might be similar to this: "mdadm: Cannot open /dev/sdd1: Device or resource busy". This happens because if there is no RAID configuration information in the mdadm.conf file, the system may create a RAID device from one disk in the array, activate it, and leave it unmounted. You can identify this problem by looking at the output of "cat /proc/mdstat". If it lists devices such as "md_d0" that are not part of your RAID setup, then first stop the extraneous device (for example: "mdadm --stop /dev/md_d0") and then try to assemble your RAID array as described below.

Create and mount filesystem

Have a look in /proc/mdstat. You should see that the array is running.

Now, you can create a filesystem, just like you would on any other device, mount it, include it in your /etc/fstab, and so on.

Common filesystem creation commands are mk2fs and mkfs.ext3. Please see options for mke2fs for an example and details.


Using the Array

Stopping a running RAID device is easy:

   mdadm --stop /dev/md0

Starting is a little more complex; you may think that:

   mdadm --run /dev/md0

would work - but it doesn't.

Linux raid devices don't really exist on their own; they have to be assembled each time you want to use them. Assembly is like creation insofar as it pulls together devices

If you earlier ran:

mdadm --create /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

then

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

would work.

However, the easy way to do this if you have a nice simple setup is:

  mdadm --assemble --scan 

For complex cases (ie you pull in disks from other machines that you're trying to repair) this has the potential to start arrays you don't really want started. A safer mechanism is to use the uuid parameter and run:

  mdadm --scan --assemble --uuid=a26bf396:31389f83:0df1722d:f404fe4c

This will only assemble the array that you want - but it will work no matter what has happened to the device names. This is particularly cool if, for example, you add in a new SATA controller card and all of a sudden /dev/sda becomes /dev/sde!!!

The Persistent Superblock (2011)

Back in "The Good Old Days" (TM), the raidtools would read your /etc/raidtab file, and then initialize the array. However, this would require that the filesystem on which /etc/raidtab resided was mounted. This was unfortunate if you want to boot on a RAID.

Also, the old approach led to complications when mounting filesystems on RAID devices. They could not be put in the /etc/fstab file as usual, but would have to be mounted from the init-scripts.

The persistent superblocks solve these problems. When an array is created with the persistent-superblock option (the default now), a special superblock is written to a location (different for different superblock versions) on all disks participating in the array. This allows the kernel to read the configuration of RAID devices directly from the disks involved, instead of reading from some configuration file that may not be available at all times.

It's not a bad idea to maintain a consistent /etc/mdadm.conf file, since you may need this file for later recovery of the array, although this is pretty much totally unnecessary today.

A persistent superblock is mandatory for auto-assembly of your RAID devices upon system boot.

NOTE: Were persistent superblocks necessary for kernel raid support? This support has been moved into user space so this section may (or may not) be seriously out of date.

Superblock physical layouts are listed on RAID superblock formats .

External Metadata (2011)

MDRAID has always used its own metadata format. There are two different major formats for the MDRAID native metadata, the 0.90 and the version-1. The old 0.90 format limits the arrays to 28 components and 2 terabytes. With the latest mdadm, version 1.2 is the default.

Starting with Linux kernel v2.6.27 and mdadm v3.0, external metadata are supported. These formats have been long supported with DMRAID and allow the booting of RAID volumes from Option ROM depending on the vendor.

The first format is the DDF (Disk Data Format) defined by SNIA as the "Industry Standard" RAID metadata format. When a DDF array is constructed, a container is created in which normal RAID arrarys can be created within the container.

The second format is the Intel(r) Matrix Storage Manager metadata format. This also creates a container that is managed similar to DDF. And on some platforms (depending on vendor), this format is supported by option-ROM in order to allow booting. [2]


To report the RAID information from the Option ROM:

   mdadm --detail-platform
 Platform : Intel(R) Matrix Storage Manager
         Version : 8.9.0.1023
     RAID Levels : raid0 raid1 raid10 raid5
     Chunk Sizes : 4k 8k 16k 32k 64k 128k
       Max Disks : 6
     Max Volumes : 2
  I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2
           Port0 : /dev/sda (3MT0585Z)
           Port1 : - non-disk device (ATAPI DVD D  DH16D4S) -
           Port2 : /dev/sdb (WD-WCANK2850263)
           Port3 : /dev/sdc (3MT005ML)
           Port4 : /dev/sdd (WD-WCANK2850441)
           Port5 : /dev/sde (WD-WCANK2852905)
           Port6 : - no device attached –

To create RAID volumes that are external metadata, we must first create a container:

   mdadm --create --verbose /dev/md/imsm /dev/sd[b-g] --raid-devices 4 --metadata=imsm

In this example we created an IMSM based container for 4 RAID devices. Now we can create volumes within the container.

   mdadm --create --verbose /dev/md/vol0 /dev/md/imsm --raid-devices 4 --level 5

Of course, the --size option can be used to limit the size of the disk space used in the volume during creation in order to create multiple volumes within the container. One important note is that the various volumes within the container MUST span the same disks. i.e. a RAID10 volume and a RAID5 volume spanning the same number of disks.

Advanced Options

Chunk sizes

The chunk-size deserves an explanation. You can never write completely parallel to a set of disks. If you had two disks and wanted to write a byte, you would have to write four bits on each disk. Actually, every second bit would go to disk 0 and the others to disk 1. Hardware just doesn't support that. Instead, we choose some chunk- size, which we define as the smallest "atomic" mass of data that can be written to the devices. A write of 16 kB with a chunk size of 4 kB will cause the first and the third 4 kB chunks to be written to the first disk and the second and fourth chunks to be written to the second disk, in the RAID-0 case with two disks. Thus, for large writes, you may see lower overhead by having fairly large chunks, whereas arrays that are primarily holding small files may benefit more from a smaller chunk size.

Chunk sizes must be specified for all RAID levels, including linear mode. However, the chunk-size does not make any difference for linear mode.

For optimal performance, you should experiment with the chunk-size, as well as with the block-size of the filesystem you put on the array. For others experiments and performance charts, check out our Performance page. You can get chunk-size graphs galore.

RAID-0

Data is written "almost" in parallel to the disks in the array. Actually, chunk-size bytes are written to each disk, serially.

If you specify a 4 kB chunk size, and write 16 kB to an array of three disks, the RAID system will write 4 kB to disks 0, 1 and 2, in parallel, then the remaining 4 kB to disk 0.

A 32 kB chunk-size is a reasonable starting point for most arrays. But the optimal value depends very much on the number of drives involved, the content of the file system you put on it, and many other factors. Experiment with it, to get the best performance.


RAID-0 with ext2

The following tip was contributed by michael@freenet-ag.de:

NOTE: this tip is no longer needed since the ext2 fs supports dedicated options: see "Options for mke2fs" below

There is more disk activity at the beginning of ext2fs block groups. On a single disk, that does not matter, but it can hurt RAID0, if all block groups happen to begin on the same disk.

Example:

With a raid using a chunk size of 4k (also called stride-size), and filesystem using a block size of 4k, each block occupies one stride. With two disks, the #disk * stride-size product (also called stripe-width) is 2*4k=8k. The default block group size is 32768 blocks, which is a multiple of the stripe-width of 2 blocks, so all block groups start on disk 0, which can easily become a hot spot, thus reducing overall performance. Unfortunately, the block group size can only be set in steps of 8 blocks (32k when using 4k blocks), which also happens to be a multiple of the stripe-width, so you can not avoid the problem by adjusting the blocks per group with the -g option of mkfs(8).

If you add a disk, the stripe-width (#disk * stride-size product) is 12k, so the first block group starts on disk 0, the second block group starts on disk 2 and the third on disk 1. The load caused by disk activity at the block group beginnings spreads over all disks.

In case you can not add a disk, try a stride size of 32k. The stripe-width (#disk * stride-size product) is then 64k. Since you can change the block group size in steps of 8 blocks (32k), using 32760 blocks per group solves the problem.

Additionally, the block group boundaries should fall on stride boundaries. The examples above get this right.

RAID-1

For writes, the chunk-size doesn't affect the array, since all data must be written to all disks no matter what. For reads however, the chunk-size specifies how much data to read serially from the participating disks. Since all active disks in the array contain the same information, the RAID layer has complete freedom in choosing from which disk information is read - this is used by the RAID code to improve average seek times by picking the disk best suited for any given read operation.

RAID-4

When a write is done on a RAID-4 array, the parity information must be updated on the parity disk as well.

The chunk-size affects read performance in the same way as in RAID-0, since reads from RAID-4 are done in the same way.


RAID-5

On RAID-5, the chunk size has the same meaning for reads as for RAID-0. Writing on RAID-5 is a little more complicated: When a chunk is written on a RAID-5 array, the corresponding parity chunk must be updated as well. Updating a parity chunk requires either

  • The original chunk, the new chunk, and the old parity block
  • Or, all chunks (except for the parity chunk) in the stripe

The RAID code will pick the easiest way to update each parity chunk as the write progresses. Naturally, if your server has lots of memory and/or if the writes are nice and linear, updating the parity chunks will only impose the overhead of one extra write going over the bus (just like RAID-1). The parity calculation itself is extremely efficient, so while it does of course load the main CPU of the system, this impact is negligible. If the writes are small and scattered all over the array, the RAID layer will almost always need to read in all the untouched chunks from each stripe that is written to, in order to calculate the parity chunk. This will impose extra bus-overhead and latency due to extra reads.

A reasonable chunk-size for RAID-5 is 128 kB. A study showed that with 4 drives (even-number-of-drives might make a difference) that large chunk sizes of 512-2048 kB gave superior results [3]. As always, you may want to experiment with this or check out our Performance page.

Also see the section on special options to mke2fs. This affects RAID-5 performance.


ext2, ext3, and ext4 (2011)

There are special options available when formatting RAID-4 or -5 devices with mke2fs or mkfs. The -E stride=nn,stripe-width=mm options will allow mke2fs to better place different ext2/ext3 specific data-structures in an intelligent way on the RAID device.

Note: The commands mkfs or mkfs.ext3 or mkfs.ext2 are all versions of the same command, with the same options; use whichever is supported, and decide whether you are using ext2 or ext3 (non-journaled vs journaled). See the two versions of the same command below; each makes a different filesystem type.

Note that ext3 no longer exists in the kernel - it has been subsumed into the ext4 driver, although ext3 filesystems can still be created and used.

Here is an example, with its explanation below:

   mke2fs -v -m .1 -b 4096 -E stride=32,stripe-width=64 /dev/md0
   or
   mkfs.ext3 -v -m .1 -b 4096 -E stride=32,stripe-width=64 /dev/md0
   Options explained:
     The first command makes a ext2 filesystem, the second makes a ext3 filesystem
     -v verbose
     -m .1 leave .1% of disk to root (so it doesnt fill and cause problems)
     -b 4096 block size of 4kb (recommended above for large-file systems)
     -E stride=32,stripe-width=64 see below calculation

Calculation

  • chunk size = 128kB (set by mdadm cmd, see chunk size advise above)
  • block size = 4kB (recommended for large files, and most of time)
  • stride = chunk / block = 128kB / 4k = 32
  • stripe-width = stride * ( (n disks in raid5) - 1 ) = 32 * ( (3) - 1 ) = 32 * 2 = 64

If the chunk-size is 128 kB, it means, that 128 kB of consecutive data will reside on one disk. If we want to build an ext2 filesystem with 4 kB block-size, we realize that there will be 32 filesystem blocks in one array chunk.

stripe-width=64 is calculated by multiplying the stride=32 value with the number of data disks in the array.

A raid5 with n disks has n-1 data disks, one being reserved for parity. (Note: the mke2fs man page incorrectly states n+1; this is a known bug in the man-page docs that is now fixed.) A raid10 (1+0) with n disks is actually a raid 0 of n/2 raid1 subarrays with 2 disks each.

Performance

RAID-{4,5,10} performance is severely influenced by the stride and stripe-width options. It is uncertain how the stride option will affect other RAID levels. If anyone has information on this, please add to the knowledge.

The ext2fs blocksize severely influences the performance of the filesystem. You should always use 4kB block size on any filesystem larger than a few hundred megabytes, unless you store a very large number of very small files on it.

Changing after creation

It is possible to change the parameters with

   tune2fs -E stride=n,stripe-width=m /dev/mdx

XFS

xfsprogs and the mkfs.xfs utility automatically select the best stripe size and stripe width for underlying devices that support it, such as Linux software RAID devices. Earlier versions of xfs used a built-in libdisk and the GET_ARRAY_INFO ioctl to gather the information; newer versions make use of enhanced geometry detection in libblkid. When using libblkid, accurate geometry may also be obtained from hardware RAID devices which properly export this information.

To create XFS filesystems optimized for RAID arrays manually, you'll need two parameters:

  • chunk size: same as used with mdadm
  • number of "data" disks: number of disks that store data, not disks used for parity or spares. For example:
    • RAID 0 with 2 disks: 2 data disks (n)
    • RAID 1 with 2 disks: 1 data disk (n/2)
    • RAID 10 with 10 disks: 5 data disks (n/2)
    • RAID 5 with 6 disks (no spares): 5 data disks (n-1)
    • RAID 6 with 6 disks (no spares): 4 data disks (n-2)

With these numbers in hand, you then want to use mkfs.xfs's su and sw parameters when creating your filesystem.

  • su: Stripe unit, which is the RAID chunk size, in bytes
  • sw: Multiplier of the stripe unit, i.e. number of data disks

If you've a 4-disk RAID 5 and are using a chunk size of 64 KiB, the command to use is:

mkfs -t xfs -d su=64k -d sw=3 /dev/md0

Alternately, you may use the sunit/swidth mkfs options to specify stripe unit and width in 512-byte-block units. For the array above, it could also be specified as:

mkfs -t xfs -d sunit=128 -d swidth=384 /dev/md0

The result is exactly the same; however, the su/sw combination is often simpler to remember. Beware that sunit/swidth are inconsistently used throughout XFS' utilities (see xfs_info below).

To check the parameters in use for an XFS filesystem, use xfs_info.

xfs_info /dev/md0
meta-data=/dev/md0               isize=256    agcount=32, agsize=45785440 blks
         =                       sectsz=4096  attr=2
data     =                       bsize=4096   blocks=1465133952, imaxpct=5
         =                       sunit=16     swidth=48 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none                   extsz=196608 blocks=0, rtextents=0

Here, rather than displaying 512-byte units as used in mkfs.xfs, sunit and swidth are shown as multiples of the filesystem block size (bsize), another file system tunable. This inconsistency is for legacy reasons, and is not well-documented.

For the above example, sunit (sunit×bsize = su, 16×4096 = 64 KiB) and swidth (swidth×bsize = sw, 48×4096 = 192 KiB) are optimal and correctly reported.

While the stripe unit and stripe width cannot be changed after an XFS file system has been created, they can be overridden at mount time with the sunit/swidth options, similar to ones used by mkfs.xfs.

From Documentation/filesystems/xfs.txt in the kernel tree:

 sunit=value and swidth=value
       Used to specify the stripe unit and width for a RAID device or
       a stripe volume.  "value" must be specified in 512-byte block
       units.
       If this option is not specified and the filesystem was made on
       a stripe volume or the stripe width or unit were specified for
       the RAID device at mkfs time, then the mount system call will
       restore the value from the superblock.  For filesystems that
       are made directly on RAID devices, these options can be used
       to override the information in the superblock if the underlying
       disk layout changes after the filesystem has been created.
       The "swidth" option is required if the "sunit" option has been
       specified, and must be a multiple of the "sunit" value.

Source: Samat Says: Tuning XFS for RAID

Back to Hardware issues Forward to Detecting, querying and testing
Personal tools
床上出现蜈蚣什么原因 什么时候闰十月 阻断是什么意思 耳膜炎是什么症状 女孩断掌纹代表什么
夏的五行属什么 结晶是什么意思 足勺念什么 浑身无力是什么原因 瘦肉精是什么
大姨夫是什么 高血压需要注意些什么 什么宽带网速快又便宜 焦虑症吃什么药最好 狗可以吃什么
1882年属什么生肖 胃不好可以吃什么 绿茶喝多了有什么危害 恒源祥属于什么档次 公主抱是什么意思
宫腔占位什么意思hcv9jop5ns4r.cn 脾湿吃什么中成药hcv9jop0ns5r.cn 什么是外心hcv8jop3ns8r.cn 指甲挂什么科hcv8jop5ns7r.cn 什么东西解酒最好最快hcv9jop3ns7r.cn
手脚脱皮吃什么维生素hcv8jop5ns0r.cn 激素六项什么时候查最准hcv9jop5ns4r.cn 大学生入伍有什么好处hcv9jop5ns9r.cn 乙肝表面抗原携带者什么意思hcv8jop6ns4r.cn 猫咪踩奶是什么意思cl108k.com
吃维生素e软胶囊有什么好处0735v.com 鱼饼是什么做的hcv8jop9ns2r.cn 八字指的是什么hcv9jop2ns4r.cn 肝胆科属于什么科hcv9jop6ns6r.cn 经血是什么血hcv9jop1ns7r.cn
蛮什么意思hcv8jop6ns1r.cn 糖尿病人适合吃什么水果hcv9jop1ns4r.cn 舌吻是什么hcv8jop8ns7r.cn 吃柠檬是什么意思hcv9jop6ns0r.cn 但愿人长久的下一句是什么hcv9jop3ns4r.cn
百度