Online Trainings

RAC Training

Performance Tuning

Goldengate Training

Dataguard Training

Goldengate Training

Flag Counter

RAC Training

Performance Tuning Training

Subscribe2

SQL Server DBA

Grid Infrastructure Permission issues – ora-27303 startup egid – 60321 (oinstall), current egid = 60329 (asmadmin)

Not strange though, but well possible to get into trap of permission issue from 11gr2 GI since the user separation (GI Owner, RDBMS Owner) and the additional groups (asmadmin, asmoper etc) has created lots of mess around, especially when you are not good in selecting the right groups/owners in the installation.

I have been told in one environment to check that, they were not able to create the database using DBCA although the GI/RDBMS installation was successful.

However, This environment is using only oracle user for both GI and RDBMS and hence should not be any issue.

Here is the snapshot that captured

image

 

DBCA failing with DG4 permission issue , ora-27303 startup egid – 60321 (oinstall), current egid = 60329 (asmadmin)

Verified the DG4 permission using crsctl getperm ora.DG4-dg where in its set oinstall and oracle as it owner, so no problem there

[root@orars4 dbs]# crsctl getperm  resource ora.DG4.dg
Name: ora.DG4.dg
owner:oracle:rwx,pgrp:oinstall:r-x,other::r–,group:oinstall:r-x,user:oracle:rwx

Could be permission issue with executables hence checked both executable ownership and permission

 

GI

[oracle@rac1 trace]$ ls $ORACLE_HOME/bin/oracle
/u01/app/11204/grid/bin/oracle
[oracle2@orars4 trace]$ ls -l  $ORACLE_HOME/bin/oracle
-rwsr-s–x. 1 oracle oinstall 239626683 Feb 20 14:19 /u01/app/11204/grid/bin/oracle

RDBMS

[oracle@rac1 trace]$ ls $ORACLE_HOME/bin/oracle
/u01/app/11204/db_1/bin/oracle
[oracle2@orars4 trace]$ ls -l  $ORACLE_HOME/bin/oracle
-rwsr-s–x. 1 oracle asmadmin 239626683 Feb 20 14:19 /u01/app/11204/db_1/bin/oracle

As you see the RDBMS executable has set with asmadmin, it should be oinstall

Resolution:-

1. Changed the group manually , chown oracle:oinstall  oracle (Not worked)

2. There’s a executable called setasmgidwrap which can be used to set the gid for RDBMS oracle executable as like GI oracle executable in $ORACLE_HOME/bin directory,

[oracle@rac1 lib]$ cd $GRID_HOME/bin
/’
[oracle@rac1 bin]$  ./setasmgidwrap o=/u01/app/11204/db_1/bin/oracle
[oracle@rac1 bin]$ ls -l /u01/app/11204/db_1/bin/oracle
-rwxr-s–x. 1 oracle oinstall 239626683 Feb 20 14:19 /u01/app/11204/grid/bin/oracle

This has not worked either, after running dbca the permission got changed again back to asmadmin

3. Upon checking more in the asmsetgid script, its calling config.o file in $ORACLE_HOME/rdbms/lib and contains the roles which you selected in the grid installation

Note:- This should require bounce of CRS/RDBMS instances as such you will need to perform a relink after changing this file, do at your own risk

cd $ORACLE_HOME/rdbms/lib

vi config.c

Change the following lines (group name)

#define SS_DBA_GRP "asmdba"
#define SS_OPER_GRP "asmoper"
#define SS_ASM_GRP "asmadmin"

to

#define SS_DBA_GRP "dba"
#define SS_OPER_GRP "oper"
#define SS_ASM_GRP "oinstall"

Shutdown CRS stack and relink  (before relink take a backup of oracle home)
#  crsctl stop crs

[oracle@rac1 grid]$  cd $ORACLE_HOME/rdbms/lib
[oracle@rac1 lib]$ $ORACLE_HOME/bin/relink all

Ensure you do not have any relink errors in relink.log

4. Finally run the setasmgidwrap script to set the RDBMS oracle binary.

[oracle@rac1 lib]$ cd $GRID_HOME/bin

[oracle@rac1 bin]$ ./setasmgidwrap o=/u01/app/oracle/11204/db_1/bin/oracle

 

5. run dbca and see if problem disappear.

11gr2 gipcd connection refuse : one of the node not starting

Well , we have got an issue that one of the node is not starting up , its a two node GRID Infrastructure and the logs shown as below

in crs alert_rac01.log

CRSD not started – unable to locate OCR

in gipcd.log

2014-09-10 06:23:00.186: [    GIPC][1117619840] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 730], original from [clsCrsctlUtil.cpp : 2934]
2014-09-10 06:23:00.186: [  OCRMSG][1117619840]prom_waitconnect: CONN NOT ESTABLISHED (0,29,1,2)
2014-09-10 06:23:00.186: [  OCRMSG][1117619840]GIPC error [29] msg [gipcretConnectionRefused]
2014-09-10 06:23:00.186: [  OCRMSG][1117619840]prom_connect: error while waiting for connection complete [24]
2014-09-10 06:24:24.809: [  OCRMSG][240011008]GIPC error [29] msg [gipcretConnectionRefused]

When verified the private network and public networks are working fine

Another important thing is if you stop the running node and start the other node it works fine, so that means at any point of time only one node is working.

This is some what known issue and have already written something about back in 2012 and at that time instances were crashing due to cluster crash. But still old issue.

Just to make everyone aware , ensure there should be no internal devices like usb0 or something enabled in cluster since the private network uses haip with internal IP in 169.* format and the internal devices also uses same range to communicate hence, the crsd get confused and reach the usb device rather cluster node and does not found any OCR. Sounds funny but yes. Read here

http://oracle-info.com/2012/09/24/pmon-ospid-nnnn-terminating-the-instance-due-to-error-481/

 

Hope this helps

 

Convert Single instance to rac to a different server

Convert Single Instance (ASM) to RAC Instance (ASM) to different Server

Assuming you have already the following environment setup and ready

    1. 11gr2 Clusterware is already installed on your target machine (XYZ)
    2. Target xyz is configured with +DATA as diskgroup
    2. Source ABC is having single instance in ASM with DATA as the disk group

##################################################
Step 1:- Take the backup in source server using RMAN
##################################################

rman> backup full database plus archivelog device type disk format ‘/backups/fullbackup_%t_%s_%d';

rman> backup controlfile device type disk format ‘/backups/controlfilebackup';

Create an pfile from spfile

##################################################
Step 2: copy the backups
scp the backup to target location
cp the pfile to target server dbs directory
cp the password file to target server dbs directory
##################################################

##################################################
Step 3: startup no mount in the target server
##################################################
with the pfile copied, set your environment as like source  note: you will need to modify /check the following params

Control_files=<to the disk it exists in the target server>
diag_dest=<to appropriate directory>
db_file_create_dest <to the diskgroup that exists in this server>
db_recovery_file_dest=<to the diskgroyp fra existing this server>

export ORACLE_SID=<sourcedbname>

sqlplus / as sysdba
startup nomount

##################################################
step 4: restore controlfile from backup
##################################################
rman> restore controlfile from ‘/location of the controlfile backup you have copied’

this will restore the controlfile from backup and mount the database if does so

rman> sql ‘alter database mount';

##################################################
step 4: restore the database
##################################################

If the backup location is different from the existing server then you have to catalog backups,for example

primary location:/backups

target location: /u02/backups

then

rman> catalog start with ‘/u02/backups/';

once catalogged

rman> restore database;

-this will create the database files restore from the backup once done recover the database

rman> recover database;

-this will recover the database and you may need some archives and it will be done by restoring backups

- once this steps over, open the database either by using reset logs

sqlplus> alter database open resetlogs;

SQL> Shutdown immediate

##################################################
step 5: Preparing the database for rac (target database)
##################################################

SQL> Startup mount

a) Create redo threads

alter database add logfile thread 2
group 4 (‘+DATA’) size 50M,
group 5 (‘+DATA’) size 50M,
group 6 (‘+DATA’) size 50M;

b) SQL> Alter database open ;

c) SQL> alter database enable public thread 2;

d) Create multiple undo tablespaces

SQL> CREATE UNDO TABLESPACE UNDOTBS2 DATAFILE ‘+DATA’ size 25M;

e) change the following database parameters

*.cluster_database = TRUE
*.cluster_database_instances = 2
*.undo_management=AUTO
prod1.undo_tablespace=UNDOTBS1
prod1.instance_name=prod1
prod1.instance_number=1
prod1.thread=1
prod2.instance_name=prod2
prod2.instance_number=2
prod2.thread=2
prod2.undo_tablespace=UNDOTBS2

prod2.local_listener=listener_rac1
prod1.local_listener=listener_rac2

e) shutdown and startup

SQL> shut immediate

SQL> startup

SQL> ?/rdbms/admin/catclust.sql

##################################################
Step 6: Prepare node 2
##################################################

a)
bash-3.00$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db
bash-3.00$ export ORACLE_SID=prod2

Create initprod2.ora on second node similar to node 1. I

In this case you have to copy spfile to second node as well. You can also keep spfile in shared location (/u03 in my case) and put same path in initprod2.ora

cd /u01/app/oracle/product/10.2.0/db/dbs

bash-3.00$ ls -lrt spfileprod.ora

-rw-r—–  1 oracle oinstall 3584 Feb 19 12:36 spfileprod.ora

bash-3.00$ cat initprod2.ora

spfile=’/u01/app/oracle/product/10.2.0/db/dbs/spfileprod.ora’

b) Create new password file for instance 2

bash-3.00$ orapwd file=orapwprod2 password=welcome1

c) Start the second instance

SQL> startup pfile=initprod2.ora

SQL> startup pfile=initprod2.ora
ORACLE instance started.
Total System Global Area  838860800 bytes
Fixed Size                  1222168 bytes
Variable Size             213912040 bytes
Database Buffers          620756992 bytes
Redo Buffers                2969600 bytes
Database mounted.
Database opened.

##################################################
Step 6: Finally create the spfile from pfile to asm disk
##################################################
SQL> create spfile=’+DATA’ from pfile=’/tmp/pfile.ora’

##################################################
Step 7: add the database to cluster
##################################################

bash-3.00$ srvctl add database -d prod -o <oraclehome> -p <spfile location>

bash-3.00$ srvctl add instance -d prod -i prod1 -n <hostname1>

bash-3.00$ srvctl add instance -d prod -i prod2 -n <hostname2>

-Thanks

Sureshgandhi

addnode.sh failed with bin/lsnodes: struct size 0

Whilst adding a node failing with following errors in the addnode.log

INFO: /u01/app/11.2.0.4/grid/oui/bin/../bin/lsnodes: struct size 0
INFO: Vendor clusterware is not detected.
INFO: Error ocurred while retrieving node numbers of the existing nodes. Please check if clusterware home is properly configured.
SEVERE: Error ocurred while retrieving node numbers of the existing nodes. Please check if clusterware home is properly configured.
INFO: User Selected: Yes/OK

 

There were few obvious reasons for this

1. bug in 11.2.0.4 which overwrite the lsnodes

Resolution:- copy the olsnodes.bin to lsnodes.bin and olsnodes to lsnodes in the node where you were running

2. Look at the permission of your grid home

Resolution: chown –R oracle:oinstall /u01/app/grid/11.2.0.4/grid

3. The node you are adding is not properly deleted i.e you have not performed update nodelist after your rootcrs.pl deconfig

Resolution: In any other nodes apart from the deleted node run ./runInstaller -updateNodeList ORACLE_HOME= "CLUSTER_NODES={rac1}" CRS=TRUE

 

-Thanks

Sureshgandhi

12c Database: Lost of data files and its impacts

Hello,

I have been reading in other blog citing that what if lost of a data file scenario. Here are the questions.

1) if in case of system/user datafile lost of Pluggable database, what happens to CDB?

ANS: Nothing happens to CDB, it will still running, similar case in non-cdb database if a user data file is lost does it hamper your database running, No right, similar way it wont affect the other PDB’s nor the CDB except the users that are using that data file.

2) If you have 10 PDB’s of one CDB, if there is lost of any single datafile of PDB(pdb1)

ANS: Same as above.

3) If you lost a datafile in CDB would that be affect?

ANS: Yes, this will affect the whole as its the master (container) for all the DB’s so any DF or system datafile lost will hamper the PDB’s

Step by Step Build of Standby (dataguard) in two node RAC

Hello All,

Here are the steps to implement the Standby in RAC and the following is the test environment

Production RAC:
oinfo12cprmy1
oinfo12cprmy2
Standby RAC:-
oinfo12cdr1
oinfo12cdr2

1 . Add standby logs on Primary Database

alter database add standby logfile thread 1 group 10 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 1 group 11 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 1 group 12 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 13 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 14 (‘+PRMY_DATA’) size 500M;
alter database add standby logfile thread 2 group 15 (‘+PRMY_DATA’) size 500M;

2. Enable force logging on Primary Database

alter database force logging;

3. In the standby database home, create and start a listener that offers a static SID entry for the standby database .

In Database home

LISTENER_oinfo12cdr1 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL=TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
)
)
)

SID_LIST_LISTENER_oinfo12cdr1 =
(SID_LIST =
(SID_DESC =
(ORACLE_HOME = /u01/sq/oracle/db/11.2.0.4)
(SID_NAME = STBY)
)
)

tnsnames.ora

PRMY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cprmy1-vip.localdomain )(PORT = 2001))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PRMY.localdomain)
)
)

STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
(CONNECT_DATA = (UR=A)
(SERVER = DEDICATED)
(SERVICE_NAME = STBY)
)
)

NOTE1 : for STBY tns string “(UR=A)” this required to connect to the standby instance even though the standby instance broguht down or in blocked state .

NOTE2 : Create a dedicated Primary database connection (tns entry shld point directly to any of the instance using VIP ) .Scan ip shld not be used .

4 .Create a TNS entry on Primary server for standby entry.

STBY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cdr1-vip.localdomain)(PORT = 1521))
(CONNECT_DATA = (UR=A)
(SERVER = DEDICATED)
(SERVICE_NAME = STBY)
)
)

4. Time being modify the tnsnames.ora in primary to local vip or create new tns rather scan as like below

PRMY =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = oinfo12cprmy1-vip.localdomain )(PORT = 2001))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PRMY.localdomain)
)
)

5. Copy the Passwordfile from Primary server to the standby server and rename it as per the standby instance name.

scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr1-vip.localdomain:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY

6. on standby host create a pfile as given below.

cat initSTBY1.ora
DB_NAME=PRMY
db_unique_name=’STBY’
STBY1.instance_name=’STBY1′
STBY2.instance_name=’STBY2′
STBY1.instance_number=1
STBY2.instance_number=2
local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=oinfo12cdr1-vip.localdomain)(PORT=1521))))’

NOTE : Local_listener parameter is required because we have another listener running from GRID . We are explicituly registering the STBY1 auxiliary instance with the Static Listener

7. Create Audit directory on standby server . Look the Primart database value and create the same directory structure on DR.

mkdir -p /u01/sq/oracle/admin/STBY/adump
export ORACLE_SID=STBY
sqlplus
startup nomount

8 . TNSPING all the tns alias on both primary and standby to crosscheck everthing is working fine .

9 . create a RMAN script on DR server as below and execute it from RMAN prompt

cat rman_script.sql

########################
From production server
########################

connect target sys/*****@PRMY;

connect auxiliary sys/*****@STBY;

run
{
allocate channel prmy1 type disk;
allocate channel prmy2 type disk;
allocate channel prmy3 type disk;
allocate channel prmy4 type disk;
allocate auxiliary channel stby1 type disk;
allocate auxiliary channel stby2 type disk;
allocate auxiliary channel stby3 type disk;
allocate auxiliary channel stby4 type disk;

DUPLICATE DATABASE FOR STANDBY FROM ACTIVE DATABASE NOFILENAMECHECK SPFILE
PARAMETER_VALUE_CONVERT ‘PRMY’,’STBY’
SET instance_name=’STBY1′
SET instance_number=’1′
SET db_unique_name=’STBY’
SET control_files=’+STBY_DATA’,’+STBY_FRA’
SET db_file_name_convert=’+PRMY_DATA’,’+STBY_DATA’,’+PRMY_FRA01′,’+STBY_FRA’,’+PRMY_DATA/PRMY’,’+STBY_DATA/STBY’,’+PRMY_FRA/PRMY’,’+STBY_FRA/STBY’
SET log_file_name_convert=’+PRMY_DATA’,’+STBY_DATA’,’+PRMY_FRA01′,’+STBY_FRA’,’+PRMY_DATA/PRMY’,’+STBY_DATA/STBY’,’+PRMY_FRA/PRMY’,’+STBY_FRA/STBY’
SET db_recovery_file_dest=’+STBY_FRA’
SET db_recovery_file_dest_size=’20G’
SET log_archive_max_processes=’5′
SET fal_client=’STBY’
SET fal_server=’PRMY’
SET standby_file_management=’AUTO’
SET log_archive_config=’dg_config=(PRMY,STBY)’
SET log_archive_dest_2=’service=PRMY lgwr async noaffirm COMPRESSION=ENABLE valid_for=(online_logfiles,primary_role) db_unique_name=PRMY';

SQL channel prmy1 “alter system set log_archive_config=”dg_config=(PRMY,STBY)””;
SQL channel prmy1 “alter system set log_archive_dest_2=”service=STBY lgwr async noaffirm COMPRESSION=ENABLE valid_for=(online_logfiles,primary_role) db_unique_name=STBY ””;
SQL channel prmy1 “alter system set log_archive_max_processes=5″;
SQL channel prmy1 “alter system set fal_client=STBY”;
SQL channel prmy1 “alter system set fal_server=PRMY”;
SQL channel prmy1 “alter system set standby_file_management=”AUTO””;
SQL channel prmy1 “alter system archive log current”;
sql channel stby1 “alter database recover managed standby database using current logfile disconnect from session”;
}

exit

rman
@rman_script.sql

10 . copy the Password file to the second instance .

scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr1:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY1
scp $ORACLE_HOME/dbs/orapwPRMY1 oinfo12cdr2:/u01/sq/oracle/db/11.2.0.4/dbs/orapwSTBY2

11 . create pfile from the curennt spfile and then create spfile in to the ASM .

create pfile=’/home/oracle/test.ora’ from spfile;

Modify the parameters, PRMY1 to STBY1 and PRMY2 to STBY2 (attached)

create spfile=’+STBY_DATA/STBY/spfileSTBY.ora’ from pfile=’/home/oracle/test.ora';

12 . Create pfile with the instance names on standby nodes to point to the spfile .

Host: Primary
cd $ORACLE_HOME/dbs/
vi initSTBY1.ora
spfile=’+STBY_DATA/STBY/spfileSTBY.ora’

Host: Second standby host
cd $ORACLE_HOME/dbs/
vi initSTBY2.ora
spfile=’+STBY_DATA/STBY/spfileSTBY.ora’

13 . Register the database with the crs.

srvctl add database -d STBY -o /u01/sq/oracle/db/11.2.0.4/ -p +STBY_DATA01/STBY/spfileSTBY.ora
srvctl add instance -d STBY -i STBY1 -n oinfo12cdr1
srvctl add instance -d STBY -i STBY2 -n oinfo12cdr2
srvctl modify database -d STBY -n STBY -o /u01/sq/oracle/db/11.2.0.4/ -r physical_standby -s mount
srvctl modify database -d STBY -o /u01/sq/oracle/db/11.2.0.4/ -p +STBY_DATA/STBY/spfileSTBY.ora

14 .stop and start the database using srvctl .

srvctl start database -d STBY

15. start the recovery mode

alter database recover managed standby database using current logfile disconnect from session;

16 .Check the log synch status on primary and DR . (optional)

set lines 200 pages 1000
select PROCESS,CLIENT_PROCESS,THREAD#,sequence#,status from v$managed_standby;

17. Also set the remote_listener parameter in standby to scap_ip to ensure the connectivity.

Hope this helps!

Dataguard : Network Tuning Parameters

Whilst, was working on a dataguard network performance issue thought should write something on this.

The most important aspect of the dataguard is network transport, and if the undersized or improper configuration may lead to the redo transport issue and production may show you the log synch waits etc.

The following parameters may be tuned if extra performance is required. Before and after testing should be performed to check the results priory to any Live implementation. Pay attention to log_synch waits in AWR.

Properly Configure TCP Send / Receive Buffer Sizes

Gains have been realised when setting send and receive socket buffer settings up to three times the BDP. BDP is product of the network bandwidth and latency. Socket buffer sizes are set using the Oracle Net parameters RECV_BUF_SIZE and SEND_BUF_SIZE, so that the socket buffer size setting affects only Oracle TCP connections. The operating system may impose limits on the socket buffer size that must be adjusted so Oracle can use larger values. For example, on Linux, the parameters net.core.rmem_max and net.core.wmem_max limit the socket buffer size and must be set larger than RECV_BUF_SIZE and SEND_BUF_SIZE.

Set the send and receive buffer sizes at either the value you calculated or 10 MB, whichever is larger.
For example, if bandwidth is 622 Mbits and latency is 30 ms, then you would calculate the minimum size for the RECV_BUF_SIZE and SEND_BUF_SIZE parameters as follows: 622,000,000 / 8 x 0.030 = 2,332,500 bytes. Then, multiply the BDP 2,332,500 x 3 for a total of 6,997,500.

In this example, you would set the initialisation parameters as follows as the calculated figure is <10 MB.
RECV_BUF_SIZE=10000000
SEND_BUF_SIZE=10000000

Increase SDU Size

With Oracle Net Services it is possible to control data transfer by adjusting the size of the Oracle Net setting for the session data unit (SDU). Oracle testing has shown that setting the SDU to its maximum value of 65535 can improve performance for the SYNC transport. You can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA), or you can set the SDU for all Oracle Net connections with the profile parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.

Note that the ASYNC transport uses the new streaming protocol and increasing the
SDU size from the default has no performance benefit.

TCP protocol stack

To preempt delays in buffer flushing in the TCP protocol stack, disable the TCP Nagle
algorithm by setting TCP.NODELAY=YES in the SQLNET.ORA file on both the
primary and standby systems. However with this, you’ll end up with a larger number of smaller packets on the network, and if latency is a problem, this will make matters worse, not better.

Hope this helps

Cluster not starting- cssd(12103)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

Hello All,

Long time !!! Its been very busy schedule and could not spend much on the blog at all.

Anyways back now, and here this time with gpnp profile.

Issue is , the node 2 not starting due to gpnp profile missing/mismatch. Following in the crs alert log in node 2


[ohasd(5347)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2014-11-16 10:14:54.881:
[gipcd(5858)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_COMM_ERR (Communication error).
[ohasd(11783)]CRS-2769:Unable to failover resource ‘ora.gpnpd’.
2014-11-16 10:01:38.505:
[cssd(12103)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2014-11-16 10:01:38.506:
[cssd(12103)]CRS-1703:Initialization of the required component GPNP failed because the GPNP server daemon is not up; details at (:CSSSC00004:) in
2014-11-16 10:01:41.511:
[cssd(12103)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in
2014-11-16 10:01:44.344:
[ohasd(11783)]CRS-2771:Maximum restart attempts reached for resource ‘ora.gpnpd'; will not restart.

Those who does not know about gpnp profile, gpnp profile is new in 11gr2 and used extensively by cluster when start and any modifications done to cluster, for example, any of the following tools used


    – srvctl
    – oifcfg
    – crsctl

Now upon investigation, understood that the following changes were made

    – Private network has been modified, 192.168.1 to 10.10.1 and 192.168.2 to 10.10.2
    – While doing modification using oifcfg the node 2 is down

 

Well, this is not good, since any changes to cluster configuration will have to populate in the gpnp profile in all nodes, located at 

    $ORACLE_HOME/gpnp/peer/profiles/profile.xml –> global profile

    $ORACLE_HOME/gpnp/rac02/profiles/peer/profile.xml –> local profile

So when the oifcfg used it updates the local profile and copies that profile to other nodes using gpnp agent.

Now, in our case the gpnp is not in sync across the nodes the node 2 got failed to start.

To investigate, first check the node 2 profile using gpnptool

Note: gpnptool is only way to update your profile, do not modify it manually, since this got signature and wallet update the profile may get corrupted.

Note: the profile is edited for clarity.

[root@rac02 gpnp]#  gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin get -o-

Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
GPnP service is not running on localhost. Found locally cached profile…

<gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="192.168.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/ Success.

Error CLSGPNP_NO_DAEMON getting profile.

 

Error is due to GPNP daemon not running and also if you look the network shows old address 192.168* rather 10.10.

I have tried to use rget which is remote profile get, and it is intact showing correct private network address. 10.10.

[root@rac01 peer]# gpnptool rget
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin rget -o-
Found 1 gpnp service instance(s) to rget profile from.

get-profile request to tcp://rac01:40082 (mdns:service:gpnp._tcp.local.://rac01:40082/agent=gpnpd,cname=rac-cluster,guid=01292fcb30feff20ff18cd0b98bb3adc,host=rac01,pid=6228/gpnpd h:rac01 c:rac-cluster u:01292fcb30feff20ff18cd0b98bb3adc):

<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net3" Adapter="eth2" Use="cluster_interconnect" IP="10.10.2.0"/><gpnp:Network id="net2" Adapter="eth1" Use="cluster_interconnect" IP="10.10.1.0"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/

Success.

Lets proceed to modify the profile.xml in node 2, Navigate to local profile location in node2

[root@rac02 peer]# cd /u01/gi/app/oracle/12.1.0.1/grid/gpnp/rac02/profiles/peer

Copy the profile to profile.back.

[root@rac02 peer]# cp profile.xml profile.bak

Un-sign the profile using gpnptool unsign, this will remove the signature from profile.

[root@rac02 peer]# gpnptool unsign -p=profile.bak
Warning: some command line parameters were defaulted. Resulting command line:
         /u01/gi/app/oracle/12.1.0.1/grid/bin/gpnptool.bin unsign -p=profile.bak -o-

<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="192.168.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="192.168.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/></gpnp:GPnP-Profile>
Success.

Edit the profile, overwrite and change the sequence as well.

<pre>
[root@rac02 peer]# gpnptool edit  -net2:net_ip="10.10.1.0" -net3:net_ip="10.10.2.0" -prf_sq=7 -p=profile.bak -o=profile.bak -ovr
Resulting profile written to "profile.bak".
Success.
[root@rac02 peer]#

</pre>

Sign back the profile, this will add the signature in profile and as well as updates the wallet, here take the wallet location.

[root@rac02 peer]# gpnptool sign -p=profile.bak -w=file:/u01/gi/app/oracle/12.1.0.1/grid/gpnp/rac02/wallets/peer -o=profile.new

Copy back the profile.bak to profile.xml

[root@rac02 peer]# cp profile.bak profile.xml

Verify the IP address changed to 10.10.

[root@rac02 peer]# cat profile.xml
<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="192.168.56.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.10.1.0" Adapter="eth1" Use="cluster_interconnect"/><gpnp:Network id="net3" IP="10.10.2.0" Adapter="eth2" Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131" Mode="legacy"/>

[root@rac02 peer]#

Noticed? the private address got changed now, hope so this will work

stop the crs , use -f option since the HAS stack only running no crs is not running.

[root@rac02 peer]# crsctl stop crs -f

start the crs, normally

[root@rac02 peer]# crsctl start crs

In alert log,

[gpnpd(7017)]CRS-2328:GPNPD started on node rac02.
2014-11-16 10:58:27.978:

Check in the configuration in node 2 using oifcfg

[root@rac02 rac02]# oifcfg getif -global
eth0  192.168.56.0  global  public
eth2  10.10.2.0  global  cluster_interconnect
eth1  10.10.1.0  global  cluster_interconnect

Just in case if you want to edit the spfile the following command can be used,
<pre>
[root@rac02 peer]# gpnptool edit -asm:asm_spf=’+OCRDG/rac-cluster/ASMPARAMETERFILE/registry.253.862511131′ -asm_dis=” -p=profile.bak -o=profile.bak -ovr -prf_sq=8
Resulting profile written to "profile.new".
Success.
[root@rac02 peer]#
</pre>

-Thanks

Sureshgandhi

ACFS-9317: No ADVM/ACFS distribution media detected at location: root.sh failed

Again, OEL 7 (UEK) and 12c Grid Infrastructure issue. OEL UEK 7 does not shipped or support ACFS yet, but we tried and stumbled :(.

Here is the log for root.sh failure.

ACFS-9317: No ADVM/ACFS distribution media detected at location: ‘/u02/app/11.2.0.2/grid1/install/usm/EL5/x86_64/2.6.18-8/2.6.18-8.el5-x86_64/bin’
root@ol5-112-rac1 ~]# /u02/app/11.2.0.2/grid1/bin/acfsroot install
ACFS-9320: Missing file: ‘advmutil’.
ACFS-9320: Missing file: ‘advmutil.bin’.
ACFS-9320: Missing file: ‘fsck.acfs’.
ACFS-9320: Missing file: ‘fsck.acfs.bin’.
ACFS-9320: Missing file: ‘mkfs.acfs’.
ACFS-9320: Missing file: ‘mkfs.acfs.bin’.
ACFS-9320: Missing file: ‘mount.acfs’.
ACFS-9320: Missing file: ‘mount.acfs.bin’.
ACFS-9320: Missing file: ‘acfsdbg’.
ACFS-9320: Missing file: ‘acfsdbg.bin’.
ACFS-9320: Missing file: ‘acfsutil’.
ACFS-9320: Missing file: ‘acfsutil.bin’.
ACFS-9320: Missing file: ‘umount.acfs’.
ACFS-9320: Missing file: ‘umount.acfs.bin’.
ACFS-9301: ADVM/ACFS installation can not proceed:
ACFS-9317: No ADVM/ACFS distribution media detected at location: ‘/u02/app/11.2.      

.. and so on, the cluster is not started, the voting disk is not formatted and left in insomnia state :(. Really trying now 12c on different kernels may take your lot of time.

Anyways, to fix this, I have tweaked the code in rootcrs.pl and crsconfig_lib.pm , as this is a test environment and I am really not worrrying about ACFS at the moment.

So hence forth back, root.sh calls /u02/app/11.2.0.2/grid1/crs/install/rootcrs.pl and that in turns see a file called /u02/app/11.2.0.2/grid1/crs/install/crsconfig_lib.pm which had lot of functions to install what required,

Now open the file rootcrs.pl (take a backup ofcourse) and comment all the following lines, search for USM keyword in the file

image Open the /u02/app/11.2.0.2/grid1/crs/install/crsconfig_lib.pm and remove the usminstall function in the call, remove the highlighted keyword completely

Note: USM calls for universal storage management which installs ACFS and ADVM as part of root.sh

image

Now run the root.sh again it will work fine.

By the way, do not install 12.1 on OEL 7 UEK for now since it seems to have lot of issues, but I see that 12.1.0.2 has not having the same

ORA-12547: TNS: lost contact on DBCA in 12c with OEL 7 (UEK)

While running dbca we are receiving the ora-12547 and we could not able to create database using DBCA anymore.

After googling and metalink search found many notes which does not resolve the issue, the reasons can be multifold

  1. Environments- SID, PATH, LD_LIBRARY_PATH – Not resolved
  2. Permission on oracle executable to 6751 – Not resolved
  3. Package gcc issue, package already installed – Not resolved
  4. Package libaio issue, package already installed – Not resolved
  5. Listeners are up and running , reload and tns changes – Not resolved

Finally, I have ended to understand the oracle and other executables are not relinked properly (while installation we ignored some errors, our bad) apparently the relink got failed.

Whoa,! wait, we did not got any pre-requisities failures when we run cluvfy, how could it been issue with relinking as all of packages are installed.

Struck now, then recollected we tweaked some files in rdbms/lib for 12.1.0.1 on OEL – 7, while installing grid, and it seems the same issue.

I really want to curse on our fate, :(, seriously , because we are not able to understand is this problem with shipping the software (oracle does it by mistake) or our installation issue, leaving that apart the following has solved our issue.

Installation log shows,

/usr/bin/ld: note: ‘__tls_get_addr@@GLIBC_2.3′ is defined in
DSO /lib64/ld-linux-x86-64.so.2 so try adding it to the linker
command line /lib64/ld-linux-x86-64.so.2: could not read symbols:
Invalid operation

INFO: collect2: error: ld returned 1 exit status

cd $ORACLE_HOME/rdbms/lib

cp env_rdbms.mk env_rdbms.mk.bck

    make changes in $ORACLE_HOME/rdbms/lib/env_rdbms.mk

modify line 176

LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS)

to

LINKTTLIBS=$(LLIBCLNTSH) $(ORACLETTLIBS) $(LINKLDLIBS) -lons

modify line 279 and 280

LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS)
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS)

to

LINK=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS) $(COMPSOBJS) -Wl,–no-as-needed
LINK32=$(FORT_CMD) $(PURECMDS) $(ORALD) $(LDFLAGS32) $(COMPSOBJS) -Wl,–no-as-needed

modify line 3041 and 3042

TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS)

to

TG4PWD_LINKLINE= $(LINK) $(OPT) $(TG4PWDMAI) \
        $(LLIBTHREAD) $(LLIBCLNTSH) $(LINKLDLIBS) -lnnz12

Now, we tried to relink the libraries, and its thrown below error

cd $ORACLE_HOME/bin

relink all

INFO: /u01/app/oracle/product/12.1.0/db_1/rdbms/lib/config.o: file not recognized: File truncated collect2: error: ld returned 1 exit status

To resolve this, you will have to remove the config.o file and relink the oracle to create new config.o (metalink note

 

make -f ins_rdbms.mk config.o

make -f ins_rdbms.mk ioracle

and perform relink all again, verify the logs in $ORACLE_HOME/install/relinkaction…log for any errors

And finally, it resolved the dbca is working properly and other executables.

Conclusion: I am not favour in running 12.1.0.1 on OEL 7 (UEK) any more since this is sequence of issues coming up while performing installations, rather try 6.3 (not even 6.4) to ensure you have seemless installations

-Happy Reading

Sureshgandhi