Join with the other

    322

Subscribe Blog Via Email

Flag Counter

Step by Step Build of Standby (dataguard) in two node RAC

Hello All,

Here are the steps to implement the Standby in RAC and the following is the test environment

Production RAC:
oinfo12cprmy1
oinfo12cprmy2
Standby RAC:-
oinfo12cdr1
oinfo12cdr2

1 . Add standby logs on Primary Database

alter database add standby logfile thread 1 group 10 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 1 group 11 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 1 group 12 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 13 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 14 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 15 (‘+PRMY_DATA’) size 500M;

2. Enable force logging on Primary Database

alter database force logging;

3. In the standby database home, create and start a listener that offers a static SID entry for the standby database .

In Database home

LISTENER_oinfo12cdr1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL=TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
)
)
)

SID_LIST_LISTENER_oinfo12cdr1 =
(SID_LIST =
(SID_DESC =
(ORACLE_HOME = /u01/sq/oracle/db/11.2.0.4)
(SID_NAME = STBY)
)
)

tnsnames.ora

PRMY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cprmy1-vip.localdomain )(PORT = 2001))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PRMY.localdomain)
)
)

STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
(CONNECT_DATA = (UR=A)
(SERVER = DEDICATED)
(SERVICE_NAME = STBY)
)
)

NOTE1 : for STBY tns string “(UR=A)” this required to connect to the standby instance even though the standby instance broguht down or in blocked state .

NOTE2 : Create a dedicated Primary database connection (tns entry shld point directly to any of the instance using VIP ) .Scan ip shld not be used .

4 .Create a TNS entry on Primary server for standby entry.

STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
(CONNECT_DATA = (UR=A)
(SERVER = DEDICATED)
(SERVICE_NAME = STBY)
)
)

4. Time being modify the tnsnames.ora in primary to local vip or create new tns rather scan as like below

PRMY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cprmy1-vip.localdomain )(PORT = 2001))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PRMY.localdomain)
)
)

5. Copy the Passwordfile from Primary server to the standby server and rename it as per the standby instance name.

scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr1-vip.localdomain:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY

6. on standby host create a pfile as given below.

cat initSTBY1.ora
DB_NAME=PRMY
db_unique_name=’STBY’
STBY1.instance_name=’STBY1′
STBY2.instance_name=’STBY2′
STBY1.instance_number=1
STBY2.instance_number=2
local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=oinfo12cdr1-vip.localdomain)(PORT=1521))))’

NOTE : Local_listener parameter is required because we have another listener running from GRID . We are explicituly registering the STBY1 auxiliary instance with the Static Listener

7. Create Audit directory on standby server . Look the Primart database value and create the same directory structure on DR.

mkdir -p /u01/sq/oracle/admin/STBY/adump
export ORACLE_SID=STBY
sqlplus
startup nomount

8 . TNSPING all the tns alias on both primary and standby to crosscheck everthing is working fine .

9 . create a RMAN script on DR server as below and execute it from RMAN prompt

cat rman_script.sql

########################
From production server
########################

connect target sys/*****@PRMY;

connect auxiliary sys/*****@STBY;

run
{
allocate channel prmy1 type disk;
allocate channel prmy2 type disk;
allocate channel prmy3 type disk;
allocate channel prmy4 type disk;
allocate auxiliary channel stby1 type disk;
allocate auxiliary channel stby2 type disk;
allocate auxiliary channel stby3 type disk;
allocate auxiliary channel stby4 type disk;

DUPLICATE DATABASE FOR STANDBY FROM ACTIVE DATABASE NOFILENAMECHECK SPFILE
PARAMETER_VALUE_CONVERT ‘PRMY’,’STBY’
SET instance_name=’STBY1′
SET instance_number=’1′
SET db_unique_name=’STBY’
SET control_files=’+STBY_DATA’,’+STBY_FRA’
SET db_file_name_convert=’+PRMY_DATA’,’+STBY_DATA’,’+PRMY_FRA01′,’+STBY_FRA’,’+PRMY_DATA/PRMY’,’+STBY_DATA/STBY’,’+PRMY_FRA/PRMY’,’+STBY_FRA/STBY’
SET log_file_name_convert=’+PRMY_DATA’,’+STBY_DATA’,’+PRMY_FRA01′,’+STBY_FRA’,’+PRMY_DATA/PRMY’,’+STBY_DATA/STBY’,’+PRMY_FRA/PRMY’,’+STBY_FRA/STBY’
SET db_recovery_file_dest=’+STBY_FRA’
SET db_recovery_file_dest_size=’20G’
SET log_archive_max_processes=’5′
SET fal_client=’STBY’
SET fal_server=’PRMY’
SET standby_file_management=’AUTO’
SET log_archive_config=’dg_config=(PRMY,STBY)’
SET log_archive_dest_2=’service=PRMY lgwr async noaffirm COMPRESSION=ENABLE valid_for=(online_logfiles,primary_role) db_unique_name=PRMY';

SQL channel prmy1 “alter system set log_archive_config=”dg_config=(PRMY,STBY)””;
SQL channel prmy1 “alter system set log_archive_dest_2=”service=STBY lgwr async noaffirm COMPRESSION=ENABLE valid_for=(online_logfiles,primary_role) db_unique_name=STBY ””;
SQL channel prmy1 “alter system set log_archive_max_processes=5″;
SQL channel prmy1 “alter system set fal_client=STBY”;
SQL channel prmy1 “alter system set fal_server=PRMY”;
SQL channel prmy1 “alter system set standby_file_management=”AUTO””;
SQL channel prmy1 “alter system archive log current”;
sql channel stby1 “alter database recover managed standby database using current logfile disconnect from session”;
}

exit

rman
@rman_script.sql

10 . copy the Password file to the second instance .

scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr1:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY1
scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr2:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY2

11 . create pfile from the curennt spfile and then create spfile in to the ASM .

create pfile=’/home/oracle/test.ora’ from spfile;

Modify the parameters, PRMY1 to STBY1 and PRMY2 to STBY2 (attached)

create spfile=’+STBY_DATA/STBY/spfileSTBY.ora’ from pfile=’/home/oracle/test.ora';

12 . Create pfile with the instance names on standby nodes to point to the spfile .

Host: Primary
cd $ORACLE_HOME/dbs/
vi initSTBY1.ora
spfile=’+STBY_DATA/STBY/spfileSTBY.ora’

Host: Second standby host
cd $ORACLE_HOME/dbs/
vi initSTBY2.ora
spfile=’+STBY_DATA/STBY/spfileSTBY.ora’

13 . Register the database with the crs.

srvctl add database -d STBY -o /u01/sq/oracle/db/11.2.0.4/ -p +STBY_DATA01/STBY/spfileSTBY.ora
srvctl add instance -d STBY -i STBY1 -n oinfo12cdr1
srvctl add instance -d STBY -i STBY2 -n oinfo12cdr2
srvctl modify database -d STBY -n STBY -o /u01/sq/oracle/db/11.2.0.4/ -r physical_standby -s mount
srvctl modify database -d STBY -o /u01/sq/oracle/db/11.2.0.4/ -p +STBY_DATA/STBY/spfileSTBY.ora

14 .stop and start the database using srvctl .

srvctl start database -d STBY

15. start the recovery mode

alter database recover managed standby database using current logfile disconnect from session;

16 .Check the log synch status on primary and DR . (optional)

set lines 200 pages 1000
select PROCESS,CLIENT_PROCESS,THREAD#,sequence#,status from v$managed_standby;

17. Also set the remote_listener parameter in standby to scap_ip to ensure the connectivity.

Hope this helps!

Dataguard : Network Tuning Parameters

Whilst, was working on a dataguard network performance issue thought should write something on this.

The most important aspect of the dataguard is network transport, and if the undersized or improper configuration may lead to the redo transport issue and production may show you the log synch waits etc.

The following parameters may be tuned if extra performance is required. Before and after testing should be performed to check the results priory to any Live implementation. Pay attention to log_synch waits in AWR.

Properly Configure TCP Send / Receive Buffer Sizes

Gains have been realised when setting send and receive socket buffer settings up to three times the BDP. BDP is product of the network bandwidth and latency. Socket buffer sizes are set using the Oracle Net parameters RECV_BUF_SIZE and SEND_BUF_SIZE, so that the socket buffer size setting affects only Oracle TCP connections. The operating system may impose limits on the socket buffer size that must be adjusted so Oracle can use larger values. For example, on Linux, the parameters net.core.rmem_max and net.core.wmem_max limit the socket buffer size and must be set larger than RECV_BUF_SIZE and SEND_BUF_SIZE.

Set the send and receive buffer sizes at either the value you calculated or 10 MB, whichever is larger.
For example, if bandwidth is 622 Mbits and latency is 30 ms, then you would calculate the minimum size for the RECV_BUF_SIZE and SEND_BUF_SIZE parameters as follows: 622,000,000 / 8 x 0.030 = 2,332,500 bytes. Then, multiply the BDP 2,332,500 x 3 for a total of 6,997,500.

In this example, you would set the initialisation parameters as follows as the calculated figure is <10 MB.
RECV_BUF_SIZE=10000000
SEND_BUF_SIZE=10000000

Increase SDU Size

With Oracle Net Services it is possible to control data transfer by adjusting the size of the Oracle Net setting for the session data unit (SDU). Oracle testing has shown that setting the SDU to its maximum value of 65535 can improve performance for the SYNC transport. You can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA), or you can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.

Note that the ASYNC transport uses the new streaming protocol and increasing the
SDU size from the default has no performance benefit.

TCP protocol stack

To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle
algorithm by setting TCP.NODELAY=YES in the SQLNET.ORA file on both the
primary and standby systems. However with this, you’ll end up with a larger number of smaller packets on the network, and if latency is a problem, this will make matters worse, not better.

Hope this helps

Cluster not starting- cssd(12103)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

Hello All,

Long time !!! Its been very busy schedule and could not spend much on the blog at all.

Anyways back now, and here this time with gpnp profile.

Issue is , the node 2 not starting due to gpnp profile missing/mismatch. Following in the crs alert log in node 2


[ohasd(5347)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2014-11-16 10:14:54.881:
[gipcd(5858)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_COMM_ERR (Communication error).
[ohasd(11783)]CRS-2769:Unable to failover resource ‘ora.gpnpd’.
2014-11-16 10:01:38.505:
[cssd(12103)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2014-11-16 10:01:38.506:
[cssd(12103)]CRS-1703:Initialization of the required component GPNP failed because the GPNP server daemon is not up; details at (:CSSSC00004:) in
2014-11-16 10:01:41.511:
[cssd(12103)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in
2014-11-16 10:01:44.344:
[ohasd(11783)]CRS-2771:Maximum restart attempts reached for resource ‘ora.gpnpd'; will not restart.

Those who does not know about gpnp profile, gpnp profile is new in 11gr2 and used extensively by cluster when start and any modifications done to cluster, for example, any of the following tools used


    – srvctl
    – oifcfg
    – crsctl

Now upon investigation, understood that the following changes were made

    – Private network has been modified, 192.168.1 to 10.10.1 and 192.168.2 to 10.10.2
    – While doing modification using oifcfg the node 2 is down

 

Well, this is not good, since any changes to cluster configuration will have to populate in the gpnp profile in all nodes, located at 

    $ORACLE_HOME/gpnp/peer/profiles/profile.xml –> global profile

    $ORACLE_HOME/gpnp/rac02/profiles/peer/profile.xml –> local profile

So when the oifcfg used it updates the local profile and copies that profile to other nodes using gpnp agent.

Now, in our case the gpnp is not in sync across the nodes the node 2 got failed to start.

To investigate, first check the node 2 profile using gpnptool

Note: gpnptool is only way to update your profile, do not modify it manually, since this got signature and wallet update the profile may get corrupted.

Note: the profile is edited for clarity.

[root@rac02 gpnp]#  gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin get -o-

Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
GPnP service is not running on localhost. Found locally cached profile…

<gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="192.168.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/ Success.

Error CLSGPNP_NO_DAEMON getting profile.

 

Error is due to GPNP daemon not running and also if you look the network shows old address 192.168* rather 10.10.

I have tried to use rget which is remote profile get, and it is intact showing correct private network address. 10.10.

[root@rac01 peer]# gpnptool rget
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin rget -o-
Found 1 gpnp service instance(s) to rget profile from.

get-profile request to tcp://rac01:40082 (mdns:service:gpnp._tcp.local.://rac01:40082/agent=gpnpd,cname=rac-cluster,guid=01292fcb30feff20ff18cd0b98bb3adc,host=rac01,pid=6228/gpnpd h:rac01 c:rac-cluster u:01292fcb30feff20ff18cd0b98bb3adc):

<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net3" Adapter="eth2" Use="cluster_interconnect" IP="10.10.2.0"/><gpnp:Network id="net2" Adapter="eth1" Use="cluster_interconnect" IP="10.10.1.0"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/

Success.

Lets proceed to modify the profile.xml in node 2, Navigate to local profile location in node2

[root@rac02 peer]# cd /u01/gi/app/oracle/12.1.0.1/grid/gpnp/rac02/profiles/peer

Copy the profile to profile.back.

[root@rac02 peer]# cp profile.xml profile.bak

Un-sign the profile using gpnptool unsign, this will remove the signature from profile.

[root@rac02 peer]# gpnptool unsign -p=profile.bak
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin unsign -p=profile.bak -o-

<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="192.168.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/></gpnp:GPnP-Profile>
Success.

Edit the profile, overwrite and change the sequence as well.

<pre>
[root@rac02 peer]# gpnptool edit  -net2:net_ip="10.10.1.0" -net3:net_ip="10.10.2.0" -prf_sq=7 -p=profile.bak -o=profile.bak -ovr
Resulting profile written to "profile.bak".
Success.
[root@rac02 peer]#

</pre>

Sign back the profile, this will add the signature in profile and as well as updates the wallet, here take the wallet location.

[root@rac02 peer]# gpnptool sign -p=profile.bak -w=file:/u01/gi/app/oracle/12.1.0.1/grid/gpnp/rac02/wallets/peer -o=profile.new

Copy back the profile.bak to profile.xml

[root@rac02 peer]# cp profile.bak profile.xml

Verify the IP address changed to 10.10.

[root@rac02 peer]# cat profile.xml
<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.10.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="10.10.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/>

[root@rac02 peer]#

Noticed? the private address got changed now, hope so this will work

stop the crs , use -f option since the HAS stack only running no crs is not running.

[root@rac02 peer]# crsctl stop crs -f

start the crs, normally

[root@rac02 peer]# crsctl start crs

In alert log,

[gpnpd(7017)]CRS-2328:GPNPD started on node rac02.
2014-11-16 10:58:27.978:

Check in the configuration in node 2 using oifcfg

[root@rac02 rac02]# oifcfg getif -global
eth0  192.168.56.0  global  public
eth2  10.10.2.0  global  cluster_interconnect
eth1  10.10.1.0  global  cluster_interconnect

Just in case if you want to edit the spfile the following command can be used,
<pre>
[root@rac02 peer]# gpnptool edit -asm:asm_spf=’+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131′ -asm_dis=” -p=profile.bak -o=profile.bak -ovr -prf_sq=8
Resulting profile written to "profile.new".
Success.
[root@rac02 peer]#
</pre>

-Thanks

Sureshgandhi

ACFS-9317: No ADVM/ACFS distribution media detected at location: root.sh failed

Again, OEL 7 (UEK) and 12c Grid Infrastructure issue. OEL UEK 7 does not shipped or support ACFS yet, but we tried and stumbled :(.

Here is the log for root.sh failure.

ACFS-9317: No ADVM/ACFS distribution media detected at location: ‘/u02/app/11.2.0.2/grid1/install/usm/EL5/x86_64/2.6.18-8/2.6.18-8.el5-x86_64/bin’
root@ol5-112-rac1 ~]# /u02/app/11.2.0.2/grid1/bin/acfsroot install
ACFS-9320: Missing file: ‘advmutil’.
ACFS-9320: Missing file: ‘advmutil.bin’.
ACFS-9320: Missing file: ‘fsck.acfs’.
ACFS-9320: Missing file: ‘fsck.acfs.bin’.
ACFS-9320: Missing file: ‘mkfs.acfs’.
ACFS-9320: Missing file: ‘mkfs.acfs.bin’.
ACFS-9320: Missing file: ‘mount.acfs’.
ACFS-9320: Missing file: ‘mount.acfs.bin’.
ACFS-9320: Missing file: ‘acfsdbg’.
ACFS-9320: Missing file: ‘acfsdbg.bin’.
ACFS-9320: Missing file: ‘acfsutil’.
ACFS-9320: Missing file: ‘acfsutil.bin’.
ACFS-9320: Missing file: ‘umount.acfs’.
ACFS-9320: Missing file: ‘umount.acfs.bin’.
ACFS-9301: ADVM/ACFS installation can not proceed:
ACFS-9317: No ADVM/ACFS distribution media detected at location: ‘/u02/app/11.2.      

.. and so on, the cluster is not started, the voting disk is not formatted and left in insomnia state :(. Really trying now 12c on different kernels may take your lot of time.

Anyways, to fix this, I have tweaked the code in rootcrs.pl and crsconfig_lib.pm , as this is a test environment and I am really not worrrying about ACFS at the moment.

So hence forth back, root.sh calls /u02/app/11.2.0.2/grid1/crs/install/rootcrs.pl and that in turns see a file called /u02/app/11.2.0.2/grid1/crs/install/crsconfig_lib.pm which had lot of functions to install what required,

Now open the file rootcrs.pl (take a backup ofcourse) and comment all the following lines, search for USM keyword in the file

image Open the /u02/app/11.2.0.2/grid1/crs/install/crsconfig_lib.pm and remove the usminstall function in the call, remove the highlighted keyword completely

Note: USM calls for universal storage management which installs ACFS and ADVM as part of root.sh

image

Now run the root.sh again it will work fine.

By the way, do not install 12.1 on OEL 7 UEK for now since it seems to have lot of issues, but I see that 12.1.0.2 has not having the same

ORA-12547: TNS: lost contact on DBCA in 12c with OEL 7 (UEK)

While running dbca we are receiving the ora-12547 and we could not able to create database using DBCA anymore.

After googling and metalink search found many notes which does not resolve the issue, the reasons can be multifold

  1. Environments- SID, PATH, LD_LIBRARY_PATH – Not resolved
  2. Permission on oracle executable to 6751 – Not resolved
  3. Package gcc issue, package already installed – Not resolved
  4. Package libaio issue, package already installed – Not resolved
  5. Listeners are up and running , reload and tns changes – Not resolved

Finally, I have ended to understand the oracle and other executables are not relinked properly (while installation we ignored some errors, our bad) apparently the relink got failed.

Whoa,! wait, we did not got any pre-requisities failures when we run cluvfy, how could it been issue with relinking as all of packages are installed.

Struck now, then recollected we tweaked some files in rdbms/lib for 12.1.0.1 on OEL – 7, while installing grid, and it seems the same issue.

I really want to curse on our fate, :(, seriously , because we are not able to understand is this problem with shipping the software (oracle does it by mistake) or our installation issue, leaving that apart the following has solved our issue.

Installation log shows,

/usr/bin/ld: note: ‘__tls_get_addr@@GLIBC_2.3′ is defined in
DSO /lib64/ld-linux-x86-64.so.2 so try adding it to the linker
command line /lib64/ld-linux-x86-64.so.2: could not read symbols:
Invalid operation

INFO: collect2: error: ld returned 1 exit status

cd $ORACLE_HOME/rdbms/lib

cp env_rdbms.mk env_rdbms.mk.bck

    make changes in $ORACLE_HOME/rdbms/lib/env_rdbms.mk

modify line 176

LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS)

to

LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS) -lons

modify line 279 and 280

LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS)
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS)

to

LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS) -Wl,–no-as-needed
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS) -Wl,–no-as-needed

modify line 3041 and 3042

TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS)

to

TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS) -lnnz12

Now, we tried to relink the libraries, and its thrown below error

cd $ORACLE_HOME/bin

relink all

INFO: /u01/app/oracle/product/12.1.0/db_1/rdbms/lib/config.o: file not recognized: File truncated collect2: error: ld returned 1 exit status

To resolve this, you will have to remove the config.o file and relink the oracle to create new config.o (metalink note

 

make -f ins_rdbms.mk config.o

make -f ins_rdbms.mk ioracle

and perform relink all again, verify the logs in $ORACLE_HOME/install/relinkaction…log for any errors

And finally, it resolved the dbca is working properly and other executables.

Conclusion: I am not favour in running 12.1.0.1 on OEL 7 (UEK) any more since this is sequence of issues coming up while performing installations, rather try 6.3 (not even 6.4) to ensure you have seemless installations

-Happy Reading

Sureshgandhi

Upgrading 11gr2 to 12c using Transient logical standby – Rolling upgrade

Well, this is fairly an old topic but explored lately by me. Very big post click on read more to scroll down to full page.

Ah, I was trying to explore the upgrade from 11gr2 to 12c but not traditional way, and go these two excellent white papers.

http://www.oracle.com/technetwork/database/features/availability/maa-wp-11g-transientlogicalrollingu-1-131927.pdf

http://www.oracle.com/technetwork/database/features/availability/maa-wp-11g-upgrades-made-easy-131972.pdf

The high level overview of this feature is something like below,

  1. Environment consists of Primary & Physical standby
  2. Guaranteed restore point created
  3. Physical standby will be converted into a logical standby temporarily to allow the upgrade
  4. Startup the logical standby in 12c home (assuming you have the 12c software only installation done)
  5. Upgrade the logical standby
  6. Switch over the primary to physical standby and logical standby to primary
  7. Flashback the primary database to guaranteed restore point taken
  8. Apply the redo’s generated in logical standby in primary (no upgrade required here)

To allow you the failed upgrade,

  1. You should create flashback guaranteed restore point
  2. Do not change the compatible parameter until all the databases (primary/standby) are upgraded

Let’s have a look of my environment

  1. Primary:- 11.2.0.2 – Prod1 (on same host oinfo11g)
  2. Standby:- 11.2.0.2 – Prod1DR (Physical Standby) (on same host oinfo11g)
  3. 12c software installed on both sides – /u01/app/oracle/product/12.1.0/db_1
  4. Flash back enabled on both sides

In order to perform all the activities (6 steps) you will have to do lot of manual work, instead Oracle provide you the script to physru.

Oracle11g Data Guard: Database Rolling Upgrade Shell Script(Doc ID 949322.1)

Now executed the script, (no prereqs, no changes, nothing just that both DB’s are up and running)

Step 1

Explanation:- This script takes 6 inputs, username, Primary, Standby, PrimaryTns, StandbyTns, Targetversion

In below, if you see,

Stage1 :- Script stops the media recovery, create restore point on both standby and primary and also backs up the control file

Stage2:- Script create the dictionary build and converts the logical standby to physical standby and awaits until the dictionary load happen and finally open the Prod1DR in open mode (logical standby)

Continue reading Upgrading 11gr2 to 12c using Transient logical standby – Rolling upgrade

12c Server Pools in action : Instance and service failover to server pools

In the Previous post we have converted the orcl database to server pool OLTP_SP which is running on three nodes rac01,rac03,rac04 and free server is available in rac02.

Okay lets have some discussion before proceeding,

1. Does in pre 11g version , if a instance crashed does it failed over to any node?

No. Because there is no free node available, and Instances will run on all nodes, and bound to start on specific node. Means there is no instance failover instead service will failover to available node.

Keeping in mind, with server pools we have flexibility in failing over the instances along with it services if we keep our databases in policy managed where oracle grid infrastructure will make your instances running on the serverpool it defined. That’s is the reason when you define the database to a server pool (like below) you just specify the server pool name only rather how many instances etc etc.

Ex: srvctl modify database –d orcl –g OLTP_SP

note: OLTP_SP is a server pool with 3 nodes runnings, so now although i have four nodes my database will have only running 3 instances in three nodes.

Read on now,

If we kill a pmon or crash the node rac01, the free pool node rac02 will be assigned to server pool OLTP_SP, lets see in action.

Configuration check,

(BTW, Many Thanks Deepak (again) for sending this wonderful screenshots)

image

Lets kill the pmon in rac01,

 

image

Now monitor the server pool and the read  the comments updated in the screenshots.

As above the pmon has been killed, now orcl_4 is not running rac01, initially the free pool has 1 server and after a while the free pool has given away 1 server  (as you can see 0 at last) to server pool , and finally you can see the three instances are back, but this time it is rac02,rac03,rac04.

image

As we have seen the instance crash is failing over its instance to free pool, Next Post lets see Service failover with in server pools

Thanks for reading,

Sureshgandhi

12c Database : Convert Admin Managed Database to Policy Managed Database

From 11g onwards, the database can be managed in two modes, Administer managed, i.e instance allocation/deletion service allocation/deletion and evictions will be managed by Administrator.

Where in from the 11gr2 onwards the instance allocation to hosts will be based on the cardinality and services running is also based on their configuration with in the serverpools.

To convert the database to policy managed one must configured the server pools as like in the previous post

First check our database configuration.

 

7 Database configuration

Check if you have any services, yes we do have a service server_taf as like created in this post

5. Configuaration setting 

CRS Profile setting

6 Profile setting -1

Here is the notes for above screenshot,

  • I have a database running orcl on 4 nodes with orc1,2,3,4 instances. Where the server_taf is running on orcl1 and orcl2.
  • Now we want to run our database should run on server pool OLTP_SP (read here) which is having 3 nodes in his pool. So ideally my database should run only 3 instances rather 4.
  • Also if you observe the the instances name marked as orcl1,orcl2,orcl3 which will be changed once we moved to policy managed.

Lets convert the database do policy managed and run on OLTP_SP server pool

image uhu, got error, the reason is that my service server_taf (here) is running on two nodes orcl1 and orcl2, for server pools concept the service should either run on all nodes or one node i.e uniform and singleton.

I will have to remove the service or modify the service to run either in one node or all nodes. I will to remove it. I tried to modify the service to uniform which will fail as such my database is in admin managed mode.

image

image Now I have removed the service, now let me try again to modify the database to server pool OLTP_SP as policy managed database, The following step will stop the instances, be noted while doing in production. Use –f option will stop the instances.

image 

 

Check the configuration of the database now.

12

We have now successfully moved our database to policy managed , start the database see the instances where they are running

imageAs you observed above, the rac02 is not having any instances since oltp_sp server pool contains maximum 3 nodes, so rac01,rac03,rac04 has been picked up oracle and rac02 has been given to free pool, in case of failure or some crash rac02 will be given to oltp_sp pool.

Next Post initiate a failover on rac01 and see if the free server rac02 is assigned to server pool oltp_sp

Thanks for reading,

12c Server Pools : Creating & Managing Server Pools

Hello All,

If you have not read server pool concept so far please read it from here.

In this post we were going to see,

1. create the server pools OLTP_SP with 2 nodes minimum and maximum 2 nodes with importance 4

2. That leaves 2 node in free pool out of my 4 nodes

3. Modify the server pool OLTP_SP to minimum 3 nodes and maximum 3 nodes

Read on,

1. create the server pools OLTP_SP with 2 nodes minimum and maximum 2 nodes with importance 4

[root@grid#] srvctl add serverpool –g OLTP_SP –min 2 –max 2 –importance 4

2. Check the server pool configuration

image 3. Modifying the server pool to have minimum 3 nodes.

[root@grid#] srvctl modify serverpool –g OLTP_SP –min 3 –max 3 –importance 4

[root@grid#] srvctl config serverpool –g OLTP_SP

image

 

Generic Notes on server pool:-

1. Server pools gives you the flexibility in running the pool of instances/services

2. RAC Instances can run in any of the pools defined by it , rather instance to node bounding, earlier we need to srvctl add instance but now there is no requirement to add instances (11g as well), srvctl add database would do the addition of instances automatically.

3. Further, services can run on server pool, for example in our case OLTP_SP has three nodes running i.e three instance i can create an instances either to run on single node (singleton) or all nodes(uniform). It eliminates the preferred and available method.

4. Even the rac instances names would be dynamically assigned rather orcl1, orcl2 etc, it will dynamically allocated the instances based on the pools availability.

Next Post modifying database to serverpool managed, i.e policy managed databases.

Managing Grid Infrastructure 12c : Creating and Testing TAF Services

Hello,

In this post we will create TAF (Transparent Application Failover) Services as such pre-11gr2 method that is available and preferred instances way.

Environment:-

1. 4 Node RAC, rac01,rac02,rac03,rac04 running orcl database and its instances.

2. A service server_taf is created, for which orcl1 is available instance and orcl2 is the preferred instance.

TAF Setup & Testing:-

1. Create Service

srvctl add service -d orcl -s server_taf –r orcl1 –a orcl2 -P BASIC

2. Start the service

srvctl start service -d rac -s server_taf

3. Check the services

srvctl config service -d rac

3. Modify the service to have load balancing options (optional)

SQL> execute dbms_service.modify_service (service_name => ‘server_taf’ –
, aq_ha_notifications => true –
, failover_method => dbms_service.failover_method_basic –
, failover_type => dbms_service.failover_type_select –
, failover_retries => 180 –
, failover_delay => 5 –
, clb_goal => dbms_service.clb_goal_long);

 

4. Keep tns entries in both client and server side (on all nodes too)

SERVERTAF =
(DESCRIPTION =
(LOAD_BALANCE = yes)

(ADDRESS = (PROTOCOL = TCP)(HOST = scan.localdomain)(PORT = 1521))

(CONNECT_DATA =(SERVICE_NAME = server_taf.localdomain)

(FAILOVER_MODE= (TYPE=select)(METHOD=method)

)

5. Check the configuration

All the screenshot has been sent by Deepak kumar from his test server, Much thanks to him for his time and effort and permitted us to upload for all benefit1

 

6. Check the Service run state, server_taf is running on orcl1

2. Service is running in NODE01 or orcl1 currently ( Initially)

7. Test the TAF Service

sqlplus system/manager@server_taf

check in the node with following query where is the session and failover mode status

SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;

 

8. Run a very big query in the session you have connected

SQL> Select * from dba_objects;

This will run approx 5-10 mins in my test machine

9. Kill the pmon at OS level to simulate crash on Node 1 and apparently service should failover to the available node, and also the workload i.e the above dba_objects query rather the session disconnection and disruption

 

4. On NODE01 we had kill at OS Level and checking status at DB Level or SRVCTL

On node 2, check the v$session with following query, observe the failed_over column shows that rac01 session has been failoved over to orcl2 and marked yes.3. Service was killed m NOD01 and not running in NODE02 or orcl2 currently ( relocated)

And you may also observe the session in orcl1 (node1) is still running as such.

10. Start the instance orcl1 on node 1 and relocate the services back to orcl1 from orcl24b . RELOCATE it back to orcl1 once the pmon or DB is up in RAC01 from RAC02

11. Check the configuration

5. Configuaration setting

In Next post we will see to create a server pools and then add this service to serverpool. Basically in this method, the services are bounded to instances , available node and preferred node, but in serverpools concept the services bound to nodes anywhere.

-Thanks