January 14, 2019

Enterprise Manager 13c - 'Tablespace Space Used %' metric sends false alarms

Situation is ...

Metric 'Tablespace Space Used %' is used to monitor the free space of the tablespaces. Sometimes You'll get an alarm that the Tablespace has reached it's critical value. But, the tablespace's data files are auto extensible and far away from the max size and the files are located on an ACFS which knows also auto-extensibility - or better said - an auto-resize feature. There is also plenty of space left on the underlying ASM. Nevertheless, FS free space is currently near to 100% and the current free space in the tablespace is lower than the critical value. Obviously a false alarm, isn't it?

Cause ...

for that behaviour is, that Oracle has changed the method to calculate a tablespace's free space for metric 'Tablespace Space Used %' with release 13c. New is that the metric takes the available disk space into account for its calculations. And this is correct , because no more space left on the disk means finally that a tablespace cannot auto-extend anymore. But with ACFS and the auto resize functionality, this shouldn't be a problem at all - ACFS will auto resize the FS if necessary.

Unfortunately, Oracle Support has no fix for this false alarm at this very moment. MOS says that view DBA_TABLESPACE_USAGE_METRICS has a problem to calculate the metric's base information correctly. Well, that might be a possible reason, but IMHO, the values of my particular database / tablespaces / datafiles are fine.

Solution / Workaround

Simply set the value of the auto resize parameter for the ACFS filesystem in question to an appropriate value other than the default of 20G, to avoid the false alarm problem (until the ASM runs full and ACFS cannot auto resize):
"acfsutil size -a <Size>G <FSName>"

Example:
acfsutil size -a 100G /u02/oradata/TESTDB
acfsutil size: ACFS-03642: successfully updated auto-resize settings


Share:

December 21, 2018

ODA Server Patch 18.3 available - and installed ...

Since today, Oracle Database Appliance software release 18.3.0.0.0 is available ... I took the chance to download and install the new release - this is the story:

Initial Situation:

I tested the update on an ODA X7-2M with system version 12.2.1.4.0, running five databases (12.1). There are three vlans configured on that system.

Documentation used:

Installation Instruction was taken from docs.oracle.com, Oracle Database Appliance, Release 18.3, X7-2 Deployment and User's Guide for Linux x86-64, Chapter 7 Patching Oracle Database Appliance, subchapter 'Patching Oracle Database Appliance Using the CLI'

Remarks:

- You have to download about 15GB - just for the system update, DB 18.x is around 4GB
- do not copy the update software to a NFS share and do not run the update from this directory
- if You are using NFS shares, be sure that a df -h comes back immediately. With my testsystem I had two NFS Filesystems which were not reachable at all, resulting in a 'never-come-back' df -h - this, in turn, resulted in a stuck Server Patching at step 'Configuring GI'. After I umounted those FS (by using the -f switch), the patching continued
- qosmserver has to run and a crs resource ora.net1.network has to be configured
- the patch documentation is missing the hint to first update dcsagent (odacli upgrade-dcsagent -v 18.3.0.0.0), but it's a prerequisite
- remove any 'non-default' entries from oracle's .bash_profile prior to the update
- Step 'Setting AUDIT SYSLOG LEVEL' had a status 'Failure' at the end of the update process - overall status of the update was - nevertheless - successful (will check the reason for that failure later)
- Installation documentation says after step 'apply the server update' to check if the update was successful and then to issue an 'odacli update-storage'. ATTENTION! After a successful update, the server will reboot! Wait until the system is up again before You start the update-storage command!
- Server reboot took about 10 Minutes with my X7-2M
- Documentation example for 'Update the storage components' is not correct: use '--rolling' or '-r' instead of '-rolling'
- update-storage using the '--rolling' option is not supported on ODAlite

That's it

The whole process took slightly more than one hour - and remember: the system reboots after an successful 'odacli update-server' (!)



Share:

December 7, 2018

ODA - Update to 12.2.1.4 - issue with ora.net1.network

Yesterday, I had to update an ODA from Release 12.1.2.12 to 12.2.1.4 . All prerequisite checks including prepatchreport were successful ... but the update itself not!

During the update, at stage 'GI Home Cloning', the update process failed and left the system in an 'unknown' state. A detailled analysis has shown the root cause: crs service qosmserver is missing. So, I tried to setup this missing link to a successful update:

srvctl add qosmserver
srvctl status qosmserver
srvctl start qosmserver

But, qosmserver could not be started - resource 'ora.net1.network' is missing. Well, that's absolutely true - we do not have such a resource defined for our ODAs. Finally changed the first of our vlan definitions to ora.net1.network and could start qosmserver.

Unfortunately and because of the interrupted update, GI is now using its 12.2 home, but odacli describe-component shows, that the current home is the old 12.1 - that finally results in a situation where I cannot run update-server again (GI Home Cloning fails, because the new GI Home already exists)

Outcome: As additional prerequisite, check, wether or not the resource ora.net1.network exists on Your ODA and be sure qosmserver is up and running BEFORE starting an 'odacli update-server'.



Share:

November 7, 2018

DBID Problem @ ODA - or Count von Count (Sesame Street) calculates DBIDs

I did some more investigations to get an explanation for that problem with the extra long DBIDs and the resulting problems when creating new databases on an ODA ...

And here's the simple result:

Remember? In our environment, oracle's .bash_profile runs an additional script, which prints all - at this very moment existing - oracle databases, listeners, their status and the according ORACLE_HOMEs to STDOUT. Output looks like this:

######################################################
# Following Databases/Aliases are on the ODA oda-o1234
######################################################
================================================================================
SID/PROCESS         STATUS  E_ORACLE_HOME
--------------------------------------------------------------------------------

+ASM1               up      /u01/app/12.2.0.1/oracle
TEST1               up      /u01/app/oracle/product/12.1.0.2/dbhome_1
TEST2c              up      /u01/app/oracle/product/12.1.0.2/dbhome_1
TEST2D              up      /u01/app/oracle/product/12.1.0.2/dbhome_1
TEST2E              up      /u01/app/oracle/product/12.1.0.2/dbhome_1

LISTENER            up      /u01/app/12.2.0.1/oracle

Now, take all numbers from this output, put them in one line, add the 'real' DBID from the database - and You'll have the dbid which is displayed (and used) by odacli in the internal derby database. :-)

The easy test:

Add two 'echo' lines to oracle's .bash_profile that it looks like this:

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH
umask 022

echo "1111111111111111111111111111111111111111111111111"
echo "2222222222222222222222222222222222222222222222222"

Create a database using the 'odacli create-database' command:

odacli create-database -m -dh 665b9ce4-3a13-4a04-8ccd-548e66556def -n TEST2F -r ACFS

When the database is created, issue an 'odacli list-databases' and, subsequently, an 'odacli describe-database -i <id of that database>' - check the DBID (concatenated echo "1"s plus the echo "2"s plus the 'real' DBID):

Database details
----------------------------------------------------------------
                     ID: 2a5426bc-8db9-4d07-ba87-b54f06dbbcc3
            Description: TEST2F
                DB Name: TEST2F
             DB Version: 12.1.0.2
                DB Type: Si
             DB Edition: EE
                   DBID: 111111111111111111111111111111111111111111111111122222222222222222222222222222222222222222222222222498916505
Instance Only Database: false
                    CDB: false
               PDB Name:
    PDB Admin User Name:
                  Class: OLTP
                  Shape: Odb1
                Storage: ACFS
           CharacterSet: AL32UTF8
  National CharacterSet: AL16UTF16
               Language: AMERICAN
              Territory: AMERICA
                Home ID: 665b9ce4-3a13-4a04-8ccd-548e66556def
        Console Enabled: false
     Level 0 Backup Day: Sunday
    AutoBackup Disabled: false
                Created: November 7, 2018 12:58:45 PM CET
         DB Domain Name: world

The database's DBID - which is the last portion in odacli's DBID:

SQL> select dbid from v$database;

      DBID
----------
2498916505

Imho, the answer for question 'Bug or Feature?' is quite clear ;-) And, by the way, Oracle Support is still investigating the problem.



Share:

October 20, 2018

Update! - Unexpected Behaviour With V$DIAG_ALERT_EXT

Oops! Errormessages from a Test database in my Production-DB?!?


Story behind:

I wanted to know which "ORA-" Errors have occured in my Production-DB and issued a
select

  to_char(originating_timestamp,'DD.MM.YYYYHH24:MI:SS')

  ,message_text

from v$diag_alert_ext
where message_text like '%ORA-%'
and originating_timestamp > sysdate-31
order by originating_timestamp;
to have an overview about the current - let me say - error situation. So far - so good ...

Yesterday, I've found messages from a TEST database when querying the PRODUCTION db, which made me uncertain:
"Errors in file /u01/app/oracle/diag/rdbms/TEST/TEST/trace/TEST_j000_45562.trc"

Checked that - and discussed it with some collegues ... outcome is:

The View V$DIAG_ALERT_EXT does not only contain information from the current DB but from all databases on that system / from that diagnostic_dest:
select  distinct component_id,filename
from v$diag_alert_ext
order by 1,2;
COMPONENT_ID FILENAME
------------ -----------------------------------------------------------------------
apx          /u01/app/oracle/diag/apx/+apx/+APX1/alert/log.xml
asm          /u01/app/oracle/diag/asm/+asm/+ASM1/alert/log.xml
clients      /u01/app/oracle/diag/clients/user_oracle/host_4163035053_107/alert/log.xml
clients      /u01/app/oracle/diag/clients/user_oracle/host_4163035053_82/alert/log.xml
crs          /u01/app/oracle/diag/crs/host/crs/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest1/DBTEST1/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest10/DBTEST10/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest11/DBTEST11/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest12/DBTEST12/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest13/DBTEST13/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest14/DBTEST14/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest2/DBTEST2/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest3/DBTEST3/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest4/DBTEST4/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest5/DBTEST5/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest6/DBTEST6/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest7/DBTEST7/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest8/DBTEST8/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/dbtest9/DBTEST9/alert/log.xml
rdbms        /u01/app/oracle/diag/rdbms/rocrtest/rocrtest/alert/log.xml
tnslsnr      /u01/app/oracle/diag/tnslsnr/host/asmnet1lsnr_asm/alert/log.xml
tnslsnr      /u01/app/oracle/diag/tnslsnr/host/listener/alert/log.xml
tnslsnr      /u01/app/oracle/diag/tnslsnr/host/listener_511/alert/log.xml
tnslsnr      /u01/app/oracle/diag/tnslsnr/host/listener_576/alert/log.xml
tnslsnr      /u01/app/oracle/diag/tnslsnr/host/listener_580/alert/log.xml

Well - that was unexpected ... 

Question is: Bug or feature? I'll open a SR and keep You up to date ...
By the way: Made these tests on different oracle releases: 11.2, 12.1, 12.2.

Update (November 14th, 2018)

Opened a Service Request - and Oracle Support brought an explanation:

"We have verified it. It is an expected scenario. As stated earlier V$DIAG_ALERT_EXT displays trace file and alert file data for the current container (It means all PDB) in a CDB (Even though you connect as ALTER SYSTEM SET CONTAINER - it will show all the database because all database resides on ORACLE HOME). 

Actually, V$DIAG_ALERT_EXT read the logs of all databases and listeners from the ADR Location (ORACLE HOME Directory) So, one connection to a database is enough to see all the database alert files and listener logs registered inside the ADR structure. 

So, this is an expected one. And not a BUG! 
"


Share:

REASON FOUND: DBID Problem @ ODA

Hi There!

in my blogpost "http://robertcrames.blogspot.com/2018/10/oda-bug-and-poor-support-from-oracle.html" I described a problem with the impossibility of creating databases as a result of an incredebly long DBID and the 'poor' oracle support in that specific case.

Well - I contacted Oracle support and had a call with the Support Manager and then ... it worked smoothly 😏. Provided a few more information and Oracle support had an idea:

'Remove any changes from the .bash_profile and try again' ... Unfortunately, we made no changes on root's .bash_profile (which is the 'initiator' of the create database process) :-( - but we made some changes on oracle's .bash_profile. One of these changes was, to set oracle's primary environment to Grid Infrastructure' instance +ASM1. Removed that entry, retried the 'odacli create-database' - and ... it works - DBID is correct and plausible, everything runs fine. :-)

Share:

October 12, 2018

DBID Problem @ ODA ... Bug (?)

long story told short: 


  • Oracle Database Appliance X6-2M and X7-2M, 
  • oak 12.1.2.11 up to 12.2.1.4

We were not able to create databases anymore on one of our 15 ODAs. Detailled analysis has shown that the dbid - result from an 'odacli describe-database -i <DB identifier>' - grows and grows and, when it reaches a size of more than 255 characters, any subsequent try to create a database will fail and leave the database in status 'creating'. No more databases can be created, neither can You delete the database which is left in status 'creating'.


This is what a correct DBID (result from a describe-database) looks like: 1653395885 - btw: this is exactly the real dbid found in the database (select dbid from v$database).

An - imho - 'corrupted' DBID looks like this: 2791133100113200112102253200112102127320011210212832001121021533200112102154320011210215632001121021573200112102159320011210217323200112102177232001121021793200112102181320011210218323200112102186320011210215800112102576011210251101121021196170968 - the last 'n' numbers represent the real dbid (select dbid from v$database).

Side effect, when the database creation has failed: 
odacli list-databases
DCS-10001:Internal error encountered: could not execute query.

Would be great if You could do this:

  • If You have more than five databases on an ODA, issue an 'odacli list-databases'
  • take the id from the last created database and issue an 'odacli describe-database -i <id>'
  • Post the result as a comment in this blog

Regarding oracle support:

The Service Request is now open since weeks. Started as a Sev 2, relatively early changed sev to 1. The SR is updated from oracle support very rarely, usually no answer at all when I update the SR. In the meantime I escalated the SR, asked for a recall from a manager - no result and no recall. Today asked for another recall from another manager - will see how long that will take. 

@OracleSupport: that is very poor ... leaving customers helpless and uninformed with such severe Problems. Very disappointing ...
Share:

August 8, 2018

ODA X7 – no network connect after setup / re-image


Yesterday, I’d prepared an all new X7-2M. During the setup process / network configuration (using VLAN), I ran into a problem: Despite the network configuration was setup absolutely correct, any try to connect to the system from a jumphost was unsuccessful. 

Long story short: If You want to use the SFP28 (Fiberchannel) ports an ODA X7 offers, You have to do some config changes, to make it work. 
First of all: do a firmware upgrade to version 20.08.01.14 or above. 
Next, configure the nic / nics to force the speed to 10000 and turn off autonegotiation. 

The firmware as well as a detailed description of all necessary steps can be found in MOS Note 2373070.1 ‘Using the onboard SFP28 ports on an ODA X7-2 server node’
Share:

How to copy Files to an ODA which is not available via network - by using the network

Sounds a little weird, I know - but works ... *lol*

Problem was: 

The all new ODA has installed old firmware for the SPF28 interfaces. This makes it impossible to connect the system to the network. To upgrade the interfaces You need to have the firmware files on the ODA ... but how can I copy the firmware to the ODA without a network connect?

Solution:

First of all: create an ISO File from the Files You want to have in a virtual cd. With a mac, this is quite easy to do: start disk utility, then 'File / New Image / Image from Folder' and follow the instructions. Disk Utility will create a dmg file which hast to be reamed to iso. That's it, basically.

Next steps:

Open a browser window an connect to the ILOM (https://<ilom's hostname or ip>
Login by using ILOM's root account














Launch console


Open KVMS/Storage 





























Add Storage - choose the iso you'd created and klick 'Connect'




Next, in a terminal session, start the console application (start /HOST/console), and mount the cdrom: 

mkdir /media/cdrom
mount /dev/cdrom /media/cdrom

copy the files to whereever You want on the ODA's file system :-)


Share:

April 4, 2018

Enterprise Manager and the disadvantages of WLS' Dynamic Monitoring Service (lots of metricdump files)

Had an interesting problem this morning with a customers Windows Server where Enterprise Manager 12c is running: Disk C: - where the EM Software is installed - was full. By the way: Imho, oracle software should never ever be installed in C: ...

The Problem:

Reason for the full disk was Weblogic's Dynamic Monitoring Service (or a weak housekeeping tool ;-)).
Explained:
Weblogic has a feature, called DMS, which is checking some domain metrics and saves them to a file in directory (DOMAIN_HOME/server/<ServerName>/logs/metrics) for each managed server. Each file is about 600 to 800k in size, which is not that much - except You have thousands of those files. Filename is, by the way, somewhat like metricdump*. The files are created per default in a 3 hour cycle.

The Solution:

For this customer's system, I decided to stop the Dynamic Monitoring Service for the AdminServer as well as for the EMGC_OMS1 managed server. To achieve this, search for the file dms_config.xml (DOMAIN_HOME/config/fmwconfig/servers/<ServerName>) and change the value of 'enabled' to "false"

<dumpConfiguration>
  <dump intervalSeconds="10800" maxSizeMBytes="75" enabled="false"/>
</dumpConfiguration>

Restart EM to activate the change. 
Another option could be, to change the value of intervalSeconds to a higher value - for example once a day (86'400 secs) or twice a day (43'200 secs). Or, in case you have a proper working housekeeping tool running each day, configure it that it deletes all files older than 'whatever-you-might-think'.


Share:

Copyright © Robert Crames' Oracle Database and Middleware Blog | Powered by Blogger
Design by SimpleWpThemes | Blogger Theme by NewBloggerThemes.com