Using LoadAvg for Performance Optimization

 

Linux and Unixes have excellent metric of system load called “loadavg”. In fact load average is is 3 numbers which correspond to “load average” calculated for one five and 15 minutes. It is computed as exponential moving average so most recent load have more weight in the value than old one.

数据挖掘实验室

What does Load Average corresponds to ? At least on Linux it is number of processes which are in “running” state or in “uninterruptable sleep” state which typically corresponds to disk IO. You can also map LoadAvg to VMSTAT output - it is something like moving average of sum of “r” and “b” columns from VMSTAT.

Obviously minimum value for LoadAvg is zero which corresponds to completely idle system, and there is no maximum :)

数据挖掘研究院

First thing to understand about LoadAvg it does not really tell you if it is CPU bound load or IO bound load. For example if you have LoadAvg of 10 it may mean there are 10 processes/threads actively consuming CPU or it could be same 10 processes waiting on disk IO and you can see CPU utilization being close to zero. 数据挖掘研究院

Second thing is to understand LoadAvg values are relative to your system size. If you have single CPU and 1 disk loadavg of 2 can be considered significant, while if you have 16 CPUs and 2 disks Load of 4 can be light if it is CPU bound - because the system can execute much more CPU bound tasks in parallel or High if it is Disk Bound LoadAvg.

Low Load Average does not mean there are no performance problems, for example if you run single batch job on the server with MySQL, Load Average is likely to be close to 1 even if there are a lot of CPUs and Disks - system may be quite idle and performance still poor because application is not parallel enough. Similar situations can happen if there is a lot of network IO involved or if there are a lot of locks (table/row level locks) or other limiting factors such as innodb_thread_concurrency. 数据挖掘研究院

The most interesting question I think is how LoadAvg represent box load in terms of how much load it can handle before it becomes to slow down or being completely unable to handle the load, and it is tricky question. Both for CPUs and for Disk there are two stages request can be. It can be ether currently executing or queued for further execution. The time which is needed to complete request is sum of time it was really executed and the time it was spent in queue. As the system is loaded response time starts to increase mainly because of time requests spend waiting in various queues and waiting on locks, the time of true execution may well remain constant. This is a bit of simplifications as there are number of other effects coming in play but good enough for sake of explanation.

数据挖掘论坛

What does it mean from LoadAvg standpoint ? You need to understand where parallel execution continues and where waiting in the queue starts. If you have fully CPU bound workload which is rather parallel (ie many queries will run at once) and you have 4CPUs until your LoadAvg is below 4 you have low time spend waiting for CPUs to be free to do the work. There is some wait but not much. So if you have LoadAvg of 1 and your workload scales linearly with number of connections and CPUs (ie there are no row waits involved) you can assume box can handle up to 3-4 times more load before response time starts to suffer. 数据挖掘实验室

If however the LoadAvg is 4 already it may take rather insignificant increase to take it up to 8 and you will see some delays due to queuing. If there are 4CPUs (Cores) and loadavg is 16 for CPU bound workload it often means requests should take 4 times more to complete than they would on idle box due to waiting in the queue.

Same true for pure Disk IO bound workload with small difference of disk not being replaceable (if you’re waiting on one drive you can’t use another drive instead), and the fact disks can optimize multiple outstanding requests a bit better compared to requests coming one after another. 数据挖掘实验室

For mixed workload, which is what we usually see in practice you have to do some assumptions guesses or further analyzes if you want good estimates. Ie you may want to check mpstat, vmstat and iostat to see where load comes from. But the general rule remains the same - until you’re able to explore parallel abilities of the box it will perform well as soon as you need to do a lot of queuing performance starts to suffer.

Let us clarify last point - how much more load the box can handle before it overloads, loadavg skyrockets and it becomes as good as down. First for many applications request inflow is not constant - ie web site gets poor response time and users do not spend so much time on it any more so load drops. This is however temporary relive only as there are stubborn users which would not go away even with slow responding site until their browsers timeout, which is as good as site is down. There are too many variables to come with exact numbers but generally as soon as you have long queuing started it may take just 10-20% extra load to overload system, so it is better to keep loadavg low - below number of CPUs and/or disks you have.

I must note - LoadAvg is not perfect tool for the task. It is just almost always available unlike other metrics. It is best to have profiling information so you can see as response time for your requests starts to grow. As soon as it becomes to grow with no good reason I would start to worry whatever LoadAvg shows. 数据挖掘实验室

P.S I acknowledge some of explanations are simplifying things for explanation purposes. 数据挖掘实验室

  数据挖掘工具

 

[数据挖掘专家] [数据挖掘研究院] [数据挖掘论坛] [数据挖掘实验室]
上一篇:MySQL Cluster and the Death of Secondary Indexes
下一篇:MySQL 5.0.16 乱码问题处理办法
最新评论共有 0 位网友发表了评论 , 查看所有评论
发表评论( 不能超过250字,需审核,请自觉遵守互联网相关政策法规。 )
匿名?
数据挖掘网站导航 数据挖掘论坛导航
  • 数据挖掘工具
  • 数据挖掘论坛
  • DataCruncher - Cognos
  • MineSet - MathSoft
  • Intelligent Miner - GainSmarts
  • Sqlserver - SAS - Clementine
  • CART - Weka - WizSoft
  • NeuroShell - ModelQuest
  • data mining tools - Darwin
  • 数据挖掘交友
  • 数据挖掘博客
  • 数据挖掘工具
  • 数据挖掘资源
  • 数据挖掘技术算法
  • 数据挖掘相关期刊、会议
  • 研究院联盟合作专区
  • 数据挖掘基础与相关技术
  • 数据挖掘厂商与就业
  • 数据挖掘研究者乐园
  • 知名厂商数据挖掘工具资料
  • 国内数据挖掘实验室
  • Foreign Data Mining Lab
  • 热点关注
  • MySQL高级特性----对比与其他数据库 -
  • Notes for MySQL Enterprise 5.0.30
  • 如何限制虚拟主机同时访问人数和流量?
  • MySQL Join详解
  • MySQL 存取权限系统
  • MySQL中MyISAM引擎与InnoDB引擎性能简单测
  • Guaranteeing Data Integrity with MySQL 5
  • MYSQL出错代码及出错信息对照
  • 解决php连新版本mysql数据库错误
  • 一个基于mysql的登陆验证程序
  • 论坛最新话题
  • Foundations of Statistical Natural Langu
  • Game Theory meet Data Mining: A Recent P
  • System Building: How does it help or hin
  • 数据挖掘与Clementine培训
  • 新手报到
  • 求 SASEM 客户流失预测分析
  • 数据挖掘工程师/搜索研究院—北京——无线
  • 数据挖掘入门介绍(如何着手数据挖掘)
  • Information Overload Survey Results
  • The INEX 2005 Workshop on Element Retrie
  • 相关资讯
  • How to extract data from the show table
  • mysql数据库优化
  • PHP中操作MySQL的一些要注意的问题
  • MySQL优化简明指南
  • 解决php连新版本mysql数据库错误
  • MYSQL服务维护及应用设计笔记
  • 在服务器上安装、使用MySQL的注意事项
  • MySQL Join详解
  • MYSQL出错代码及出错信息对照
  • SQL 语法参考
  • 数据挖掘实验室资料
  • 数据挖掘博客地址
  • 数据挖掘实验室网站地址
  • Prepare for Medicare audits by using dat
  • 注册成为SAS用户与爱好者俱乐部会员
  • 水南梅
  • 明日烟
  • 新人报道
  • 下载
  • 厦门服务器托管,450元/月—0592-5177319 高
  • 买空间送域名--0592-5177319 高静