Oracle Clusterware is the cross-platform cluster software required to run the Real Application Clusters (RAC) option for Oracle Database . It provides the basic clustering services at the operating-system level that enable Oracle Database software to run in clustering mode. In earlier versions of Oracle (release 9i and earlier), RAC required a vendor-supplied clusterware like Sun Cluster or Veritas Cluster Server (except when running on Linux or on Microsoft Windows ).
30-447: Oracle Clusterware is the software which enables the nodes to communicate with each other, allowing them to form the cluster of nodes which behaves as a single logical server. Oracle Clusterware is run by Cluster Ready Services (CRS) consisting of two key components: Oracle Cluster Registry (OCR), which records and maintains the cluster and node membership information; voting disk , which polls for consistent heartbeat information from all
60-449: A CPU overhead of running the communication protocol software . Concurrency control becomes an issue when more than one person or client is accessing the same file or block and want to update it. Hence updates to the file from one client should not interfere with access and updates from other clients. This problem is more complex with file systems due to concurrent overlapping writes, where different writers write to overlapping regions of
90-432: A clustered file system is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount of CPU -processing time. But in a clustered file system, a remote access has additional overhead due to the distributed structure. This includes the time to deliver the request to a server, the time to deliver the response to the client, and for each direction,
120-469: A database). Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below. The Incompatible Timesharing System used virtual devices for transparent inter-machine file system access in
150-441: A distributed file system and a distributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files – for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics (most often those of
180-447: A file system, like a shared disk file system on top of a storage area network (SAN). NAS typically uses file-based protocols (as opposed to block-based protocols a SAN would use) such as NFS (popular on UNIX systems), SMB/CIFS ( Server Message Block/Common Internet File System ) (used with MS Windows systems), AFP (used with Apple Macintosh computers), or NCP (used with OES and Novell NetWare ). The failure of disk hardware or
210-497: A given storage node in a cluster can create a single point of failure that can result in data loss or unavailability. Fault tolerance and high availability can be provided through data replication of one sort or another, so that data remains intact and available despite the failure of any single piece of equipment. For examples, see the lists of distributed fault-tolerant file systems and distributed parallel fault-tolerant file systems . A common performance measurement of
240-525: A permanent child process called "evmlogger" and generates events. The EVMd child process ‘evmlogger’ spawns new children processes on demand and scans the callout directory to invoke callouts. It will restart automatically on failures and death of the EVMd process does not halt the instance. EVMd runs as the "oracle" user. OPROCd provides the server fencing solution for the Oracle Clusterware. It
270-460: A process called OCLSOMON and causes a cluster node to reboot if OPROCd is hung. Fencing (computing) Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning. As the number of nodes in a cluster increases, so does the likelihood that one of them may fail at some point. The failed node may have control over shared resources that need to be reclaimed and if
300-453: A type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance. A shared-disk file system uses a storage area network (SAN) to allow multiple computers to gain direct disk access at the block level . Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on
330-412: Is a distributed group membership system that allows the applications to coordinate activities to achieve a common result. As such, it provides synchronization services between nodes, access to the node membership information, as well as enabling basic cluster services, including cluster group services and cluster locking. It can also run without integration with vendor clusterware. Failure of OCSSd causes
SECTION 10
#1732776739293360-445: Is a possibility that a malfunctioning node could itself consider the rest of the cluster to be the one that is malfunctioning, a split brain condition could ensue, and cause data corruption . Instead, the system has to assume the worst scenario and always fence in case of problems. There are two classes of fencing methods, one which disables a node itself, the other disallows access to resources such as shared disks. In some cases, it
390-510: Is also known as Common Internet File System (CIFS). In 1986, IBM announced client and server support for Distributed Data Management Architecture (DDM) for the System/36 , System/38 , and IBM mainframe computers running CICS . This was followed by the support for IBM Personal Computer , AS/400 , IBM mainframe computers under the MVS and VSE operating systems, and FlexOS . DDM also became
420-432: Is assumed that if a node does not respond after a given time-threshold it may be assumed as non-operational, although there are counterexamples, e.g. a long paging rampage. The STONITH method stands for "Shoot The Other Node In The Head", meaning that the suspected node is disabled or powered off. For instance, power fencing uses a power controller to turn off an inoperable node. The node may then restart itself and join
450-425: Is shared by being simultaneously mounted on multiple servers . There are several approaches to clustering , most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are
480-455: Is the process monitor for Oracle Clusterware and it uses the hang check timer or watchdog timer (depending on the implementation) for the cluster integrity. OPROCd is locked in the memory and runs as a real time process . This sleeps for a fixed time and runs as the "root" user. Failure of the OPROCd process causes the node to restart. OPROCd is so important that even it is being monitored by
510-757: The 1960s. More file servers were developed in the 1970s. In 1976, Digital Equipment Corporation created the File Access Listener (FAL), an implementation of the Data Access Protocol as part of DECnet Phase II which became the first widely used network file system. In 1984, Sun Microsystems created the file system called " Network File System " (NFS) which became the first widely used Internet Protocol based network file system. Other notable network file systems are Andrew File System (AFS), Apple Filing Protocol (AFP), NetWare Core Protocol (NCP), and Server Message Block (SMB) which
540-608: The client node. The most common type of clustered file system, the shared-disk file system – by adding mechanisms for concurrency control – provides a consistent and serializable view of the file system, avoiding corruption and unintended data loss even when multiple clients try to access the same files at the same time. Shared-disk file-systems commonly employ some sort of fencing mechanism to prevent data corruption in case of node failures, because an unfenced device can cause data corruption if it loses communication with its sister nodes and tries to access
570-410: The cluster later. However, there are approaches in which an operator is informed of the need for a manual restart for the node. The resources fencing approach disallows access to resources without powering off the node. This may include: When the cluster has only two nodes, the reserve/release method may be used as a two node STONITH whereby upon detecting that node B has 'failed', node A will issue
600-417: The foundation for Distributed Relational Database Architecture , also known as DRDA. There are many peer-to-peer network protocols for open-source distributed file systems for cloud or closed-source clustered file systems, e. g.: 9P , AFS , Coda , CIFS/SMB , DCE/DFS , WekaFS, Lustre , PanFS, Google File System , Mnet , Chord Project . Network-attached storage (NAS) provides both storage and
630-714: The machine to reboot to avoid a split-brain situation. This is also required in a single instance configuration if Automatic Storage Management (ASM) is used. ASM was a new feature in Oracle 10g . OCSSd runs as the "oracle" user. The following functions are provided by the Oracle Cluster Synchronization Services daemon (OCSSd): The third component in OCS is the Event Volume Management Logger daemon (EVMd). EVMd spawns
SECTION 20
#1732776739293660-439: The node is acting erratically, the rest of the system needs to be protected. Fencing may thus either disable the node, or disallow shared storage access, thus ensuring data integrity. A node fence (or I/O fence) is a virtual "fence" that separates nodes which must not have access to a shared resource from that resource. It may separate an active node from its backup. If the backup crosses the fence and, for example, tries to control
690-532: The nodes to avoid the corruption of data (due to the possible failure of communication between the nodes), also known as fencing . The CRS daemon runs as "root" ( super user ) on UNIX platforms and runs as a service on Windows platforms. The following functions are provided by the Oracle Cluster Ready Services daemon (CRSd): Oracle Cluster Synchronization Services daemon (OCSSd) provides basic ‘group services’ support. Group Services
720-496: The nodes when the cluster is running, and acts as a tiebreaker during communication failures. The CRS service has four components, each handling a variety of functions: Cluster Ready Services daemon (CRSd), Oracle Cluster Synchronization Service Daemon (OCSSd), Event Volume Manager Daemon (EVMd), and Oracle Process Clusterware Daemon (OPROCd). Failure or death of the CRS daemon can cause node failure , which triggers automatic reboots of
750-461: The reserve and obtain all resources (e.g. shared disk) for itself. Node B will be disabled if it tries to do I/O (in case it was temporarily hung). On node B the I/O failure triggers some code to kill the node. Persistent reservation is essentially a match on a key, so the node which has the right key can do I/O, otherwise its I/O fails. Therefore, it is sufficient to change the key on a failure to ensure
780-403: The reserve/release mechanism of SCSI, have existed since at least 1985. Fencing is required because it is impossible to distinguish between a real failure and a temporary hang . If the malfunctioning node is really down, then it cannot do any damage, so theoretically no action would be required (it could simply be brought back into the cluster with the usual join process). However, because there
810-409: The right behavior during failure. However, it may not always be possible to change the key on the failed node. STONITH is an easier and simpler method to implement on multiple clusters, while the various approaches to resources fencing require specific implementation approaches for each cluster implementation. Shared disk file system A clustered file system ( CFS ) is a file system which
840-449: The same disk array as the primary, a data hazard may occur. Mechanisms such as STONITH are designed to prevent this condition. Isolating a node means ensuring that I/O can no longer be done from it. Fencing is typically done automatically, by cluster infrastructure such as shared disk file systems , in order to protect processes from other active nodes modifying the resources during node failures. Mechanisms to support fencing, such as
870-495: The same information other nodes are accessing. The underlying storage area network may use any of a number of block-level protocols, including SCSI , iSCSI , HyperSCSI , ATA over Ethernet (AoE), Fibre Channel , network block device , and InfiniBand . There are different architectural approaches to a shared-disk filesystem. Some distribute file information across all the servers in a cluster (fully distributed). Distributed file systems do not share block level access to
900-400: The same storage but use a network protocol . These are commonly known as network file systems , even though they are not the only file systems that use the network to send data. Distributed file systems can restrict access to the file system depending on access lists or capabilities on both the servers and the clients, depending on how the protocol is designed. The difference between
#292707