RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

Oracle RAC Administration - Part 13: Cache Coherency

来源: 作者:unkonwn 时间:2007-01-23 点击:

Some Performance metrics specific to the RAC environment

Cache Coherency

What exactly do we mean by Cache Coherency? Our Oracle RAC environment needs some added sets of metrics rather than a regular Oracle RAC installation, which I sometimes refer to as a “Single-Node RAC”. I call it a Single-Node RAC because someday that Oracle application will also grow and need a ticket to “RACdom”. A typical production DBA, responsible for uptime and upkeep of his RAC Database needs more that just some statspack runs; he will need to measure the health of his HSI (High Speed Interconnects) Network interfaces, he will have to monitor and diagnose the traffic volume across the nodes and response times. A typical high intensive OLTP environment can keep you pretty busy. To measure the traffic we will concentrate on two categories:

数据挖掘研究院

  • GCS (Global Cache Services)
  • GES (Global Enqueue Services)

So what are they? 数据挖掘研究院

Global Cache Service

According to the manual:

Process that implement Cache Fusion. It maintains the block mode for blocks in the global role. It is responsible for block transfers between instances. The Global Cache Service employs various background processes such as the Global Cache Service Processes (LMSn) and Global Enqueue Service Daemon (LMD). 数据挖掘研究院

It actually is more or less like your buffer cache, but here it acts globally across the nodes. This process is an integral part to the cache-fusion concepts. So what does a buffer have, data blocks obviously. Simply said, the coherency in the Global Buffer Cache is maintained by making sure that whenever an attempt to modify the database block is made, a global lock is acquired. . Now this “asking instance” will have both the past copy of the block (for redo purposes) as well as the current version of the block containing both committed and uncommitted transactions. Should another node come asking for that block, then it is the GCS’s responsibility to do a “Block Version Lookup” at the node, which is currently holding the global lock to the block. The LMSn processes are crucial for a successful operation of GCS and do the block version lookup, block mode etc. 数据挖掘研究院

Global Enqueue Service

According to the manual:

数据挖掘研究院

A service that coordinates enqueues that are shared globally.

The blocks in your RAC environment do most of the work themselves, but there is a crucial area when GES or the Global Enqueue Services come in. A seamless coordination across the nodes is crucial for RAC's operation. The GES is primarily responsible for maintaining coherency in the dictionary and library caches. The dictionary cache consists of the data dictionary master information for each node in its SGA (System Global Area) primarily for quicker lookup and access. Any DDL committed from a requesting node needs to be sync’ed and written across all data dictionaries in all nodes of the RAC environment. The GES makes sure that the changes remain consistent across the nodes and that there are no discrepancies. Moreover, with the same directive, the locks must be created and maintained across the nodes and GES must ensure that there are no deadlocks across requesting nodes over access to the same objects. LMON, LCK and LMD processes work in tandem to make the GES operate in a smooth and seamless fashion. 数据挖掘研究院

GV$ Views

Obviously, the meat part of the whole equation is, “Where are my RAC views?” RAC environment has additional views known as Global Views. A typical view for a Single Node installation is V$ but for RAC you have GV$ views. In addition, all these views have additional columns like INST_ID to identify nodes across the RAC environment. So a typical 4 node RAC (like that of our VMware ESX 3.0 nodes) will give you four nodes in our 4-node RAC with their own data when querying the GV$ view. Obviously, you can query individual nodes from any node. To get started try doing this: 数据挖掘研究院

SQL> select * from gv$sysstat where name like '%gcs %'; 

数据挖掘研究院

This will give you a result set with specific attention to GCS messages sent across the nodes. If this value is inconsistent across nodes or if huge differences are apparent then it might be time to investigate.

数据挖掘研究院

Finding which NIC was used for block transfer

Try to do the following to find out which NIC (private obviously) was used for Cache Fusion:

SQL> oradebug setmypid
Statement processed.
SQL> oradebug ipc
Information written to trace file.
SQL> oradebug tracefile_name
/oracle/app/oracle/product/10.2.0.1/admin/…/10gr2_ora_12443.trc
SQL>
 数据挖掘研究院 

The trace file will contain the details of the IPC information along with the interconnect details: 数据挖掘研究院

SKGXPCTX: 0xad95d70 ctx
admono 0xbcfc2b9 admport:
SSKGXPT 0xad95e58 flags         info for network 0
        socket no 8     IP 10.0.0.25  UDP 38206
        sflags SSKGXPT_UP
        info for network 1
        socket no 0     IP 0.0.0.0      UDP 0
        sflags SSKGXPT_DOWN
        active 0        actcnt 1
context timestamp 0
  

It is clear that it is our node 1 using private port 10.0.0.25 via UDP. 数据挖掘实验室

What networks do various Oss use?

OS

数据挖掘研究院

Network

Linux (RHEL/Centos 4.4)

Gigabit Ethernet 数据挖掘研究院

Windows 2003 R2

数据挖掘研究院

Gigabit Ethernet

AIX 数据挖掘研究院

UDP High-Perf Switch, UDP

数据挖掘实验室

HP-True64

RDG (Reliable Data Gram) Memory Channel

数据挖掘研究院

Sun 数据挖掘研究院

RSM (Remote Shared Memory Firelink) 数据挖掘研究院

Several vendors, like Veritas (Symantec now), have their own protocols to enable a fast and efficient cache fusion operation. 数据挖掘研究院

Conclusion

In our next article, we will carefully start moving towards Workload Management. (AWR, FAN, etc.) I will try to keep the articles more fun and examples oriented. Although I must stress on the fact that it is not all examples and we must keep coming back to fundamentals. 数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?