Start and use storage cell software to configure cell disks, flash disks, grid disks

To start the software (correctly) and use it to configure cell/grid/flash disks the number of open file descriptors must be increased (a very clear error will be raised otherwise).

As it found googling edit the /etc/sysctl.ctl and add/set

fs.file-max = 65536

and edit the /etc/security/limit.conf files and add/set

* soft nofile 65536
* hard nofile 65536

To communicate over InfiniBand Oracle uses rds protocol. All the modules of rds must be loaded (and configured to be loaded over machine restarts)

[root@stocell1 ~]# modprobe rds
[root@stocell1 ~]# modprobe rds_tcp
[root@stocell1 ~]# modprobe rds_rdma

and permanently editing/creating rds.conf

[root@stocell1 ~]# vi /etc/modprobe.d/rds.conf

insert line

install rds /sbin/modprobe –ignore-install rds && /sbin/modprobe rds_tcp && /sbin/modprobe rds_rdma

[EDIT: pay attention to double “–” that sometimes became a single char “–”]
Now, as celladmin user the storage cell software could be started

[root@stocell1 ~]# su – celladmin
[celladmin@stocell1 ~]$ cellcli -e alter cell restart services allStopping the RS, CELLSRV, and MS services…
CELL-01509: Restart Server (RS) not responding.
Starting the RS, CELLSRV, and MS services…
Getting the state of RS services… running
Starting CELLSRV services…
The STARTUP of CELLSRV services was not successful.
CELL-01547: CELLSRV startup failed due to unknown reasons.
Starting MS services…
The STARTUP of MS services was successful.

The error in not unknown (as stated) but well known and expected

[Required IP parameters missing]

🙂

So, set the interconnect (= InfiniBand connection) and …

[celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1
Cell stocell1 successfully created
Starting CELLSRV services…
The STARTUP of CELLSRV services was successful.
Flash cell disks, FlashCache, and FlashLog will be created…
CellDisk FD_00_stocell1 successfully created
CellDisk FD_01_stocell1 successfully created
CellDisk FD_02_stocell1 successfully created
CellDisk FD_03_stocell1 successfully created
CellDisk FD_04_stocell1 successfully created
CellDisk FD_05_stocell1 successfully created
Flash log stocell1_FLASHLOG successfully created
Flash cache stocell1_FLASHCACHE successfully created

(I’m not sure why Flash components are auto configured, but it can be modified later if needed)
Configure cell disks

[celladmin@stocell1 ~]$ cellcli -e create celldisk all
CellDisk CD_DISK01_stocell1 successfully created
CellDisk CD_DISK02_stocell1 successfully created
CellDisk CD_DISK03_stocell1 successfully created
CellDisk CD_DISK04_stocell1 successfully created
CellDisk CD_DISK05_stocell1 successfully created
CellDisk CD_DISK06_stocell1 successfully created
CellDisk CD_DISK07_stocell1 successfully created
CellDisk CD_DISK08_stocell1 successfully created
CellDisk CD_DISK09_stocell1 successfully created
CellDisk CD_DISK10_stocell1 successfully created
CellDisk CD_DISK11_stocell1 successfully created
CellDisk CD_DISK12_stocell1 successfully created

and grid disks

[celladmin@stocell1 ~]$ cellcli -e create griddisk all harddisk prefix=DATA
GridDisk DATA_CD_DISK01_stocell1 successfully created
GridDisk DATA_CD_DISK02_stocell1 successfully created
GridDisk DATA_CD_DISK03_stocell1 successfully created
GridDisk DATA_CD_DISK04_stocell1 successfully created
GridDisk DATA_CD_DISK05_stocell1 successfully created
GridDisk DATA_CD_DISK06_stocell1 successfully created
GridDisk DATA_CD_DISK07_stocell1 successfully created
GridDisk DATA_CD_DISK08_stocell1 successfully created
GridDisk DATA_CD_DISK09_stocell1 successfully created
GridDisk DATA_CD_DISK10_stocell1 successfully created
GridDisk DATA_CD_DISK11_stocell1 successfully created
GridDisk DATA_CD_DISK12_stocell1 successfully created

Work done!

Thanks to Steven Lee. His post on the same issue gave me the solution for 2 problems:
– the needed path /var/log/oracle found using the method suggested by Lee
– a strange memory problem with 4GB RAM that was solved simply resizing RAM to 2GB (according with some answers of Lee)
Furthermore the differences between my VM ad his VM helped me to fix the problem related to rds modules (that probably evolved in the meantime)

/*+ esp */

60 thoughts on “Start and use storage cell software to configure cell disks, flash disks, grid disks

  1. Pingback: Prepare some virtual disks (files!) « Dba Esp

  2. Pingback: Exadata Virtual Test Environment for OCE Prep – storage cell node | Dba Esp

  3. Hi ,
    when I created the cell,it didn’t created the flash components .Also,creating celldisk took all the disks which seems logical as those are similar disks .Can you tell me how your flashdisks were created.?

    [celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1
    Cell stocell1 successfully created
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was successful.
    [celladmin@stocell1 ~]$ cellcli
    CellCLI> create celldisk all
    CellDisk CD_disk1_stocell1 successfully created
    CellDisk CD_disk10_stocell1 successfully created
    CellDisk CD_disk11_stocell1 successfully created
    CellDisk CD_disk12_stocell1 successfully created
    CellDisk CD_disk2_stocell1 successfully created
    CellDisk CD_disk3_stocell1 successfully created
    CellDisk CD_disk4_stocell1 successfully created
    CellDisk CD_disk5_stocell1 successfully created
    CellDisk CD_disk6_stocell1 successfully created
    CellDisk CD_disk7_stocell1 successfully created
    CellDisk CD_disk8_stocell1 successfully created
    CellDisk CD_disk9_stocell1 successfully created
    CellDisk CD_flash1_stocell1 successfully created
    CellDisk CD_flash2_stocell1 successfully created
    CellDisk CD_flash3_stocell1 successfully created
    CellDisk CD_flash4_stocell1 successfully created
    CellDisk CD_flash5_stocell1 successfully created
    CellDisk CD_flash6_stocell1 successfully created

    CellCLI>

    • In my env create cell command automatically create all flash stuff. I cannot check but I suppose that there is a very complex algorithm to recognize flash disks from cell/disks/raw links: FLASH uppercase 😉 in link name.
      I’m not sure but you can simply check that.
      Drop cell, recreate links, recreate cell, eventually create flash stuff manually and than create celldisks

      Ciao

  4. The idea crossed my mind but i thought it was too stupid to even try..but now the CAPS have worked…Hail Oracle !

  5. Hi,

    When i try to create cell with interconnect1=eth1, CELLSRV service cannot startup with error

    CellCLI> create cell exapoc interconnect1=eth1
    Cell exapoc successfully created
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful.
    CELL-01547: CELLSRV startup failed due to unknown reasons.

    Wed Jul 30 02:39:50 2014 787 msec State dump completed for CELLSRV
    Errors in file /opt/oracle/cell11.2.3.3.1_LINUX.X64_140529.1/log/diag/asm/cell/exapoc/trace/svtrc_6248_0.trc (incident=113):
    ORA-00600: internal error code, arguments: [LinuxBlockIO::init], [], [], [], [], [], [], [], [], [], [], []
    Incident details in: /opt/oracle/cell11.2.3.3.1_LINUX.X64_140529.1/log/diag/asm/cell/exapoc/incident/incdir_113/svtrc_6248_0_i113.trc
    Wed Jul 30 02:39:55 2014
    State dump interrupted for Cellsrv by RS. It did not complete in 5 seconds.
    [RS] monitoring process /opt/oracle/cell11.2.3.3.1_LINUX.X64_140529.1/cellsrv/bin/cellrsomt (pid: 6247) returned with error: 124
    [RS] Could not start Service CELLSRV correctly. Try stopping
    [RS] Stopped Service CELLSRV

    Regards,

    • I can’t stop myself from reading your blogs. I love reading your posts. Spot on with this posting, as you always are. Have they tested your theory? I seriously appreciate individuals like you. This information you are providing is awesome.

  6. hi,

    some line from the trace file.

    ///////////////////
    2014-07-30 22:57:18.363997 :00000005: CELLSRV needs 475 hugepages, but there are only 12 available. 2014-07-30 22:57:18.364042 :00000006: CELLSRV trying to reserve 463 more hugepages.
    2014-07-30 22:57:18.366826 :00000007: Successfully allocated 902MB of hugepages for buffersLockPool name:FastFileInit::shared_pin_locks type:MUTEX POOL group:147 numLocks:256 nextLockIndex:0 totalLockRefs:0 lockArray:0x7f55dd576590
    LockPool name:FastFileInit::in_mem_MD_locks type:RWLOCK POOL group:34 numLocks:256 nextLockIndex:0 totalLockRefs:0 lockArray:0x7f55dd735e60
    Writing message type OSS_PIPE_ERR_FAILED_STARTUP_RESTART to OSS->RS pipe
    2014-07-30 22:57:20.267087 :00000009: Master: SIGUSR2 delivered by pid – 4219, uid – 0. Dumping call stack, process map, system state, and trace buffers
    Writing message type OSS_PIPE_ERR_FAILED_STARTUP_RESTART to OSS->RS pipe
    skgznp_write to RSOMT failed, retval=56822
    slos 0x7ffffbf891c0 Error Category: 56824
    Operation: send
    Location: skgznpwm2
    DepInfo: Broken pipe
    Error Code: 32DDE: Flood control is not active
    Incident 129 created, dump file: /opt/oracle/cell11.2.3.3.1_LINUX.X64_140529.1/log/diag/asm/cell/exapoc/incident/incdir_129/svtrc_4231_0_i129.trc
    ORA-00600: internal error code, arguments: [LinuxBlockIO::init], [], [], [], [], [], [], [], [], [], [], []

    Writing message type OSS_PIPE_ERR_FAILED_STARTUP_RESTART to OSS->RS pipe
    skgznp_write to RSOMT failed, retval=56822
    slos 0x7ffffbf95200 Error Category: 56824
    Operation: send
    Location: skgznpwm2
    DepInfo: Broken pipe
    Error Code: 322014-07-30 22:57:21.679204 :0000000E: CELLSRV error – ORA-600 internal error
    /////////////////

    • ciao,
      sorry, but from that info I could guess nothing.
      Please put some other info as output of
      uname -a
      more /etc/hosts
      lsmod | grep rds
      and detail of VM (memory, …)

      • Hi,

        [celladmin@localhost ~]$ uname -a
        Linux localhost.localdomain 2.6.18-371.el5 #1 SMP Mon Sep 30 16:34:30 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
        [celladmin@localhost ~]$ more /etc/hosts
        # Do not remove the following line, or various programs
        # that require network functionality will fail.
        127.0.0.1 localhost.localdomain localhost
        ::1 localhost6.localdomain6 localhost6
        [celladmin@localhost ~]$ lsmod|grep rds
        rds_rdma 106305 0
        rds_tcp 48097 0
        rds 154921 8 rds_rdma,rds_tcp
        rdma_cm 73301 2 rds_rdma,ib_iser
        ib_core 107841 7 rds_rdma,ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad

        VM info:
        2048MB Memory

        IDE Controller

        IDE Secondary Master (CD/DVD):Empty
        SATA Controller
        SATA Port 0:stocell1.vmdk (Normal, 25.00 GB)
        SATA Port 1:stocell1_DISK01.vmdk (Normal, 500.00 MB)
        SATA Port 2:stocell1_DISK02.vmdk (Normal, 500.00 MB)
        SATA Port 3:stocell1_DISK03.vmdk (Normal, 500.00 MB)
        SATA Port 4:stocell1_DISK04.vmdk (Normal, 500.00 MB)
        SATA Port 5:stocell1_DISK05.vmdk (Normal, 500.00 MB)
        SATA Port 6:stocell1_DISK06.vmdk (Normal, 500.00 MB)
        SATA Port 7:stocell1_DISK07.vmdk (Normal, 500.00 MB)
        SATA Port 8:stocell1_DISK08.vmdk (Normal, 500.00 MB)
        SATA Port 9:stocell1_DISK09.vmdk (Normal, 500.00 MB)
        SATA Port 10:stocell1_DISK10.vmdk (Normal, 500.00 MB)
        SATA Port 11:stocell1_DISK11.vmdk (Normal, 500.00 MB)
        SATA Port 12:stocell1_DISK12.vmdk (Normal, 500.00 MB)
        SATA Port 13:stocell1_FLASH01.vmdk (Normal, 400.00 MB)
        SATA Port 14:stocell1_FLASH02.vmdk (Normal, 400.00 MB)
        SATA Port 15:stocell1_FLASH03.vmdk (Normal, 400.00 MB)
        SATA Port 16:stocell1_FLASH04.vmdk (Normal, 400.00 MB)
        SATA Port 17:stocell1_FLASH05.vmdk (Normal, 400.00 MB)
        SATA Port 18:stocell1_FLASH06.vmdk (Normal, 400.00 MB)

  7. All seems to be well done. You should debug at deeper level or try rebuilding the cell.
    try first rebuilding disks, links, prepare disks with dd, verify permissions.
    If you want to play/investigate, debug at deeper level: try with “strace cellcli -e create cell exapoc interconnect1=eth1”.
    If you will not have new interesting details, cellcli itself is a script running a java machine that will execute your command.
    So “strace” at deeper level (directly java vm+command).

    • Dear,

      May I know hwo to prepare disk with dd? do i need to create partition for each cell disk such as /dev/sdb1 ??

      • You must create disks in VM as virtual disks, you must link in a specific path, just to be sure do a dd if=/dev/zero of=yourcelldiskhere count =1000
        Pay attention to avoid dd-ing your system disk. All is in the post…

    • Dear all,

      I am trying to setup a exadata virtuallab according to this nice link and i get stuck at this point:

      [celladmin@exqrcel01 ~]$ cellcli -e create cell exqrcel01 interconnect1=eth1

      CELL-02625: Interface eth1 refers to device name .
      Device name must be same as Interface name.
      [celladmin@exqrcel01 ~]$

      [root@exqrcel01 config]# ifconfig -a
      eth0 Link encap:Ethernet HWaddr 08:00:27:F4:14:28
      inet addr:192.168.2.19 Bcast:192.168.2.255 Mask:255.255.255.0 –> My public IP
      ……

      eth1 Link encap:Ethernet HWaddr 08:00:27:E0:CA:FF —> should be my infiband
      inet addr:192.168.56.6 Bcast:192.168.56.255 Mask:255.255.255.0
      ……

      Any idea, how could i get forward?

      Thanks and merry Xmas
      Ousseini Oumarou

      • First of all … Merry Christmas !!
        Then if you already found the problem and the fix, please post for future help.
        Otherwise you should check ifcfg* files in /etc/sysconfig/network-scripts/
        because the error seems to be related to network configuration

      • Hello,
        I could configured the cellserver successfully and kfod on the dbserver also return the expected disks. However, the grid installation failed now the 2nd time with the ORA-15080.

        Although, i have selected 3 disks (ASM_OCR normal redundancy), I can see ORA-15063 (insufficient number of disks) in asmca Logfile??.
        Strange behavior also on the storage server (exqrcel01): CELLSRV went down and the CELLCLI ALERTHISTORY listed an ORA-0600 Error.

        This is really a cumbersome task. I do not know if this could be related with my environment settings:
        OEL 6.6, CellServer software: cell-11.2.3.3.1_LINUX.X64_140708-1.x86_64.
        Virtualbox running on external Hard Disk.

        [celladmin@exqrcel01 ~]$ uname -na
        Linux exqrcel01 2.6.32-504.el6.x86_64 #1 SMP Tue Oct 14 01:47:47 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
        [celladmin@exqrcel01 ~]$
        [celladmin@exqrcel01 ~]$ rpm -qa | grep -i cell
        cell-11.2.3.3.1_LINUX.X64_140708-1.x86_64
        [celladmin@exqrcel01 ~]$

        [root@exqrdb01 Desktop]# /software/grid11204_unzip/stage/ext/bin/kfod disks=all op=disks
        WARNING: Using brute force method to determine the size of /dev/raw/rawctl.
        There will be performance issues. Please check configuration to determine the cause for the failure of ioctl
        ——————————————————————————–
        Disk Size Path User Group
        ================================================================================
        1: 1024 Mb o/192.168.56.6/DATA_CD_DISK01_exqrcel01
        2: 1024 Mb o/192.168.56.6/DATA_CD_DISK02_exqrcel01
        3: 1024 Mb o/192.168.56.6/DATA_CD_DISK03_exqrcel01
        4: 1024 Mb o/192.168.56.6/DATA_CD_DISK04_exqrcel01
        …….

        [root@exqrdb01 Desktop]# /u01/app/11.2.0/grid/root.sh
        Performing root user operation for Oracle 11g

        The following environment variables are set as:
        ORACLE_OWNER= grid
        ORACLE_HOME= /u01/app/11.2.0/grid

        ………
        Installing Trace File Analyzer
        CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘exqrdb01’
        CRS-2676: Start of ‘ora.cssdmonitor’ on ‘exqrdb01’ succeeded
        CRS-2672: Attempting to start ‘ora.cssd’ on ‘exqrdb01’
        CRS-2672: Attempting to start ‘ora.diskmon’ on ‘exqrdb01’
        CRS-2676: Start of ‘ora.diskmon’ on ‘exqrdb01’ succeeded
        CRS-2676: Start of ‘ora.cssd’ on ‘exqrdb01’ succeeded

        Disk Group ASM_OCR creation failed with the following message:
        ORA-15018: diskgroup cannot be created
        ORA-15080: synchronous I/O operation to a disk failed

        Configuration of ASM … failed
        see asmca logs at /u01/app/oracle/cfgtoollogs/asmca for details
        ……

        tail -100 /u01/app/oracle/cfgtoollogs/asmca/asmca-141226AM124315.log

        [main] [ 2014-12-26 00:44:10.108 CET ] [USMInstance.configureLocalASM:3041] ORA-15032: not all alterations performed
        ORA-15017: diskgroup “ASM_OCR” cannot be mounted
        ORA-15063: ASM discovered an insufficient number of disks for diskgroup “ASM_OCR”

        CellCLI> list cell detail
        name: exqrcel01
        ….
        cellsrvStatus: stopped
        msStatus: running
        rsStatus: running

        CellCLI>

        ——————-
        CELLCLI –> Alerthistory

        CellCLI> list alerthistory
        ….
        35 2014-12-25T23:44:38+01:00 critical “RS-7445 [Serv MS is absent] [It will be restarted] [] [] [] [] [] [] [] [] [] []”
        36 2014-12-26T00:44:18+01:00 critical “ORA-00600: internal error code, arguments: [StorageIdx::getOclSIRegion], [], [], [], [], [], [], [], [], [], [], []”

        CellCLI>

      • I had problem with rds on 6.5. The same for some blog’s readers. I suggest to use oel 5 with el kernel.

      • Hello,

        the error CELL-02625: Interface eth1 refers to device name was the missing entry DEVICE=eth1 in
        /etc/sysconfig/network-scripts/ifcfg-eth1.
        I do not know why it was missing, probally cause is the clone. Then I cloned the cellserver from an existing virtualbox. After adding the entry, the configuration run successfully.

        I hope this information could help.

        Regards

      • Hi, Ousseini Oumarou

        I also got this error. This error relates to network related error.
        The main impact of this error resides in /etc/sysconfig/network-scripts/ifcfg-xxx network file information.
        The Exadata binaries reads the data from this file and if found some unusual then throws an error.

        This can be resolved by changing/updating the information in the file.
        1. In some case it may require to change the “NAME” field that file that is differs from device name.
        2. If some cases “DEVICE” keyword is missing in the file. Require to update in the file.
        3. There may be a miss match in Mac address of of the ethernet card.

        In my case the issue is resolved by adding the “DEVICE=eth2” value in the file as it was not present in OEL 6.10 network file information.

  8. Dear Raymond

    sysctl -w fs.aio-max-nr=50000000

    and also put into /etc/sysctl.conf

    will solve your problem.

  9. Hi
    I am unable to create cell with error connecting to MS. It complains about the port 8888, but the port is listening. Any ideas and suggestions?

    [celladmin@stocell1 ~]$ cellcli -e alter cell restart services all

    Stopping the RS, CELLSRV, and MS services…
    The SHUTDOWN of services was successful.
    Starting the RS, CELLSRV, and MS services…
    Getting the state of RS services… running
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful.
    CELL-01547: CELLSRV startup failed due to unknown reasons.
    Starting MS services…
    The STARTUP of MS services was successful.

    [celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1

    CELL-01514: Connect Error. Verify that Management Server is listening at the specified HTTP port: 8888.
    [celladmin@stocell1 ~]$

    Below are my environment and info from the log:

    [root@stocell1 modprobe.d]# lsmod |grep rds
    rds_rdma 80877 0
    rdma_cm 36834 1 rds_rdma
    ib_core 74355 6 rds_rdma,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad
    rds_tcp 10293 0
    rds 96610 2 rds_rdma,rds_tcp

    [root@stocell1 modprobe.d]# netstat -anp|grep 8888
    tcp 0 0 127.0.0.1:34027 127.0.0.1:8888 TIME_WAIT –
    tcp 0 0 127.0.0.1:34032 127.0.0.1:8888 TIME_WAIT –
    tcp 0 0 ::ffff:127.0.0.1:8888 :::* LISTEN 6540/java

    [root@stocell1 modprobe.d]# netstat -rn
    Kernel IP routing table
    Destination Gateway Genmask Flags MSS Window irtt Iface
    192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
    127.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
    0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0

    Below are from ms-odl.log

    [root@stocell1 modprobe.d]#
    [2014-10-26T11:26:53.828-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSCoreImpl] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] lunstat: normal changeStat: found lunname: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/stocell1_DISK01
    [2014-10-26T11:26:53.828-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSCoreImpl] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] In lunFound: LUN /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/stocell1_DISK01, os devicename: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/stocell1_DISK01
    [2014-10-26T11:26:54.007-05:00] [ossmgmt] [WARNING] [] [ms.core.MSCoreImpl] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Tuning Block IO failed on device: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/stocell1_DISK01
    [2014-10-26T11:26:54.008-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] msosscomm ctx not valid, trying to init
    [2014-10-26T11:26:54.037-05:00] [ossmgmt] [ERROR] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Required IP parameters not configured. Err: 36
    [2014-10-26T11:26:54.039-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] _ms_mprotect_corrupt_buf is set to TRUE
    [2014-10-26T11:26:54.044-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSCoreImpl] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Error while trying to sync diflist: oracle.ossmgmt.common.core.SageException: CELL-02627: There is a communication error between MS and CELLSRV. Configuration file cellinit.ora is malformed or does not include required information.[[
    oracle.ossmgmt.common.core.SageException: CELL-02627: There is a communication error between MS and CELLSRV. Configuration file cellinit.ora is malformed or does not include required information.
    at oracle.ossmgmt.ms.core.MSOSSComm.static_sendrecv(Native Method)
    at oracle.ossmgmt.ms.core.MSOSSComm.isValidCellDisk(MSOSSComm.java:2989)
    at oracle.ossmgmt.ms.core.MSCoreImpl.isValidCellDisk(MSCoreImpl.java:2043)
    at oracle.ossmgmt.ms.core.MSCoreImpl.isValidSageLun(MSCoreImpl.java:2070)
    at oracle.ossmgmt.ms.core.MSCoreImpl.lunFound(MSCoreImpl.java:2920)
    at oracle.ossmgmt.ms.core.MSCoreImpl.getNewDiskAdpState(MSCoreImpl.java:8320)
    at oracle.ossmgmt.ms.core.MSDiskPollTimerTask.run(MSDiskPollTimerTask.java:108)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)
    ]]
    [2014-10-26T11:30:13.997-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] msosscomm ctx not valid, trying to init
    [2014-10-26T11:30:13.999-05:00] [ossmgmt] [ERROR] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Required IP parameters not configured. Err: 36
    [2014-10-26T11:30:14.000-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] _ms_mprotect_corrupt_buf is set to TRUE
    [2014-10-26T11:30:14.008-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] msosscomm ctx not valid, trying to init
    [2014-10-26T11:30:14.011-05:00] [ossmgmt] [ERROR] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Required IP parameters not configured. Err: 36
    [2014-10-26T11:30:14.012-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] _ms_mprotect_corrupt_buf is set to TRUE
    [2014-10-26T11:30:33.442-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] msosscomm ctx not valid, trying to init
    [2014-10-26T11:30:33.449-05:00] [ossmgmt] [ERROR] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] Required IP parameters not configured. Err: 36
    [2014-10-26T11:30:33.450-05:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 13] [ecid: 127.0.0.1:24313:1414340313430:3,0] _ms_mprotect_corrupt_buf is set to TRUE

    [root@stocell1 trace]# uname -a
    Linux stocell1 2.6.32-431.el6.x86_64 #1 SMP Wed Nov 20 23:56:07 PST 2013 x86_64 x86_64 x86_64 GNU/Linux

    [root@stocell1 trace]# cat /etc/hosts
    127.0.0.1 stocell1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    Below are from Alert log:

    RS-7445 [Required IP parameters missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []
    Incident details in: /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/stocell1/incident/incdir_169/rstrc_6529_4_i169.trc
    Sun Oct 26 12:17:47 2014
    RSBK version=11.2.3.2.1,label=OSS_11.2.3.2.1_LINUX.X64_130109,Wed_Jan__9_06:09:48_PST_2013
    [RS] Started Service RS_BACKUP with pid 6539
    [RS] Kill previous monitoring process for core RS
    Sun Oct 26 12:17:47 2014
    [RS] Started monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrssmt with pid 6544
    Sweep [inc][169]: completed
    [RS] Required IP parameters not configured in cellinit.ora. Err: 36
    OS Hugepage status:
    Total/free hugepages available=12/12; hugepage size=2048KB
    [RS] Start service CELLSRV failed with error: -74.
    Sun Oct 26 12:17:47 2014
    Could not connect to MS socket. Communication with MS may be degraded. errno=115
    [RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124
    [RS] Started Service MS with pid 6540

    [root@stocell1 raw]# ls -l
    total 0
    lrwxrwxrwx 1 root root 8 Oct 25 22:26 stocell1_DISK01 -> /dev/sdb
    lrwxrwxrwx 1 root root 8 Oct 25 22:27 stocell1_DISK02 -> /dev/sdc
    lrwxrwxrwx 1 root root 8 Oct 25 22:27 stocell1_DISK03 -> /dev/sdd
    lrwxrwxrwx 1 root root 8 Oct 25 22:27 stocell1_DISK04 -> /dev/sde
    lrwxrwxrwx 1 root root 8 Oct 25 22:28 stocell1_DISK05 -> /dev/sdf
    lrwxrwxrwx 1 root root 8 Oct 25 22:28 stocell1_DISK06 -> /dev/sdg
    lrwxrwxrwx 1 root root 8 Oct 25 22:31 stocell1_DISK07 -> /dev/sdh
    lrwxrwxrwx 1 root root 8 Oct 25 22:31 stocell1_DISK08 -> /dev/sdi
    lrwxrwxrwx 1 root root 8 Oct 25 22:31 stocell1_DISK09 -> /dev/sdj
    lrwxrwxrwx 1 root root 8 Oct 25 22:31 stocell1_DISK10 -> /dev/sdk
    lrwxrwxrwx 1 root root 8 Oct 25 22:32 stocell1_DISK11 -> /dev/sdl
    lrwxrwxrwx 1 root root 8 Oct 25 22:32 stocell1_DISK12 -> /dev/sdm
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK13 -> /dev/sdn
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK14 -> /dev/sdo
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK15 -> /dev/sdp
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK16 -> /dev/sdq
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK17 -> /dev/sdr
    lrwxrwxrwx 1 root root 8 Oct 25 22:34 stocell1_DISK18 -> /dev/sds
    [root@stocell1 raw]#
    [root@stocell1 raw]# fdisk -l |grep “B,”
    Disk /dev/sda: 26.8 GB, 26843545600 bytes
    Disk /dev/sdb: 524 MB, 524288000 bytes
    Disk /dev/sdc: 524 MB, 524288000 bytes
    Disk /dev/sdd: 524 MB, 524288000 bytes
    Disk /dev/sde: 524 MB, 524288000 bytes
    Disk /dev/sdf: 524 MB, 524288000 bytes
    Disk /dev/sdg: 524 MB, 524288000 bytes
    Disk /dev/sdh: 524 MB, 524288000 bytes
    Disk /dev/sdi: 524 MB, 524288000 bytes
    Disk /dev/sdj: 524 MB, 524288000 bytes
    Disk /dev/sdk: 524 MB, 524288000 bytes
    Disk /dev/sdl: 524 MB, 524288000 bytes
    Disk /dev/sdm: 524 MB, 524288000 bytes
    Disk /dev/sdn: 419 MB, 419430400 bytes
    Disk /dev/sdo: 419 MB, 419430400 bytes
    Disk /dev/sdp: 419 MB, 419430400 bytes
    Disk /dev/sdq: 524 MB, 524288000 bytes
    Disk /dev/sds: 419 MB, 419430400 bytes
    Disk /dev/sdr: 419 MB, 419430400 bytes
    Disk /dev/mapper/vg_stocell3-lv_root: 23.6 GB, 23630708736 bytes
    Disk /dev/mapper/vg_stocell3-lv_swap: 2684 MB, 2684354560 bytes
    ~

    • Hi,
      Usually such kind of error are related to host/cellinit/eth configuration. Post them or check with info in the blog.

      You should also fix disk names: storage cell sw automatically recognizes flash disks by name with FLASH string inside… 🙂 it seems a joke but…

  10. Hi,

    Thanks for prompt reply. I will fix the disk name shortly but I am always confused with the netwok. I thought that there was some network issue but couldn’t figure out. I could ping HOST (192.168..1.5) from stocell1 (192.168.1.52), and ping back. I tried both localhost IP 127.0.0.1 and static IP 192.168.1.52 for the VM. but didn’t work. Your help is greatly appreciated.

    Here is the HOST ifcfg-eth0:

    DEVICE=eth0
    TYPE=Ethernet
    UUID=315ac4fe-111f-4542-a937-dab7c0567f68
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=none
    DEFROUTE=yes
    IPV4_FAILURE_FATAL=yes
    IPV6INIT=no
    NAME=”System eth0″
    NETMASK=255.255.255.0
    USERCTL=no
    HWADDR=00:1F:29:DE:8B:3E
    IPADDR=192.168.1.5
    PREFIX=24
    GATEWAY=192.168.1.1
    DNS1=192.168.1.1
    LAST_CONNECT=1413740127

    HOST ifcfg-eth1:

    DEVICE=eth1
    TYPE=Ethernet
    UUID=e7d3e3c9-ad13-472b-900e-1b91486c45c0
    ONBOOT=no
    NM_CONTROLLED=yes
    BOOTPROTO=dhcp
    DEFROUTE=yes
    IPV4_FAILURE_FATAL=yes
    IPV6INIT=no
    NAME=”System eth1″
    HWADDR=00:1F:29:DE:8B:42
    PEERDNS=yes
    PEERROUTES=yes

    VM stocell1: ifcfg-eth0:

    DEVICE=eth0
    TYPE=Ethernet
    UUID=e7fd04da-9ce8-4143-9f99-29ebc3372c71
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=dhcp
    DEFROUTE=yes
    IPV4_FAILURE_FATAL=yes
    IPV6INIT=no
    NAME=”System eth0″
    HWADDR=08:00:27:38:C4:DC
    PEERDNS=yes
    PEERROUTES=yes
    LAST_CONNECT=1414337451

    VM stocell1: ifcfg-eth1

    DEVICE=eth1
    TYPE=Ethernet
    UUID=e7c64609-0f37-4c04-ad9f-984201e6bc49
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=none
    IPADDR=192.168.1.52
    PREFIX=24
    GATEWAY=192.168.1.1
    DEFROUTE=yes
    IPV4_FAILURE_FATAL=yes
    IPV6INIT=no
    NAME=”System eth1″
    HWADDR=08:00:27:56:63:36
    LAST_CONNECT=1414349747

    On VM stocell1, the network is below:

    Adapter1 is attached to “Bridged Adapter” and the name is “eth0”
    Adapter2 is attached to “Host-only Adapter” and the name is “vboxnet0”

  11. Hi,

    I fixed the Disk name to FLASH, and fixed some errors, now the config is below, but I am still getting the same error about the HTTP port 8888.

    HOST Virtualbox Setting
    Host-only Networks vboxnet0 IPv4 =192.168.56.1,
    IPv4 Network Mask=255.255.255.0
    IPv6 = (there are some numbers, can’t remove them),
    IPv6 Network Mask Length=64
    eth0 Method: Manual
    IPv4=192.168.1.5
    Netmask=255.255.255.0
    Gateway=192.168.1.1

    VM stocell1 Setting
    Host-only Networks vboxnet0

    eth0 Method: Automatic (DHCP)
    eth1 IPv4 = 192.168.56.101
    Netmask=255.255.255.0
    Gateway=192.168.1.1

    I noticed in the Cell install log “.install_log.txt”, the installation inflated oc4jpatch to /tmp and tried to apply, but it says

    apply -jdk /usr/java/jdk1.5.0_15/ -oh /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/oc4j/ms -silent /tmp/oc4jpatch/7439847 command failed: No such file or directory

    The oc4jpatch actually does not exist in /tmp, wonder if it was deleted unapplied. Not sure if it matters.

  12. And in cellinit.ora:

    version=0.0
    HTTP_PORT=8888
    bbuChargeThreshold=800
    SSL_PORT=23943
    RMI_PORT=23791
    bbuTempThreshold=60
    DEPLOYED=TRUE
    JMS_PORT=9127
    BMC_SNMP_PORT=162

  13. OK, figured out what happened to http port 8888. After reboot, the firewall came up. shutdown Firewall, SELINUX, and shutdown IPv6 fixed that issue. However, I still don’t seem to get passed the Cell creation part. Below is the error:

    cellcli -e create cell interconnect1=eth1

    CELL-02598: Ipaddress/Netmask attribute is not properly configured for interconnect eth1

    Here is eth1:

    Address: 192.168.56.12 ( this is the fix IP that I gave to stocell2)
    Netmask: 255.255.255.0
    Gateway: 192.168.1.1 (What should this Gateway IP be, the router IP 192.168.1.1, or the Host IP 192.168.56.1 ? It doesn’t
    matter either way however)

    Any ideas?

    • I’m not sure if there is something related to os version you choose. By the way the simulated infiniband should be … 56.xxx on your env. Both machines should have an ip on that network. Correct routing can be configured also in eth1-route and eth1-rule configuration files (under same path of eth1 net config file. But I think is only a performance problem if you can ping/ssh from a machine to the other

  14. You are 100% correct that it was indeed the OS version issue. I lowered it to 5.10 and worked smoothly. Thanks!

  15. Hi,

    First i would like to say that this is a wonderful way of making the virtual environment for oracle exadata. I follow all the steps but got stacked with following. It would greatly appreciated if you could help me.

    [celladmin@localhost ~]$ cellcli -e alter cell restart services all

    Stopping the RS, CELLSRV, and MS services…
    The SHUTDOWN of services was successful.
    Starting the RS, CELLSRV, and MS services…
    Getting the state of RS services… running
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful. Error: Start Timed out
    Starting MS services…
    The STARTUP of MS services was successful.

    here is the output from alert.log file:

    Cache Allocation: BufferSize: 32768. Num buffers: 5000. Start Address: 2AE48B076000
    Cache Allocation: BufferSize: 65536. Num buffers: 5000. Start Address: 2AE494CB7000
    Cache Allocation: BufferSize: 10485760. Num buffers: 7. Start Address: 2AE4A8538000
    CELL communication is configured to use 1 interface(s):
    192.168.56.102
    [RS] Started Service MS with pid 5717
    Sun Apr 19 16:33:47 2015
    IPC version: Oracle UDP/IP (generic)
    IPC Vendor 1 Protocol 2
    Version 4.1
    Sun Apr 19 16:34:16 2015
    [RS] Process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 5714) received exception [signal num: 14] [ADDR:0x0]
    Sun Apr 19 16:34:16 2015
    Sun Apr 19 16:34:16 2015State dump completed for Cellsrv
    Sun Apr 19 16:34:16 2015
    State dump signal delivered to Cellsrv by RS.
    Sun Apr 19 16:34:21 2015
    State dump interrupted for Cellsrv by RS. It did not complete in 5 seconds.
    Clean shutdown signal delivered to OSS
    [RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124

      • os version: same as you described in your site

        Linux localhost.localdomain 2.6.18-371.el5 #1 SMP Mon Sep 30 16:34:30 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux

        memory info: this time i bump it upto 4gb

        MemTotal: 4050948 kB
        MemFree: 1443844 kB
        Buffers: 132368 kB
        Cached: 840552 kB
        SwapCached: 0 kB
        Active: 987032 kB
        Inactive: 535028 kB
        HighTotal: 0 kB
        HighFree: 0 kB
        LowTotal: 4050948 kB
        LowFree: 1443844 kB
        SwapTotal: 4095992 kB
        SwapFree: 4095992 kB
        Dirty: 76 kB
        Writeback: 0 kB
        AnonPages: 549172 kB
        Mapped: 89868 kB
        Slab: 78532 kB
        PageTables: 29480 kB
        NFS_Unstable: 0 kB
        Bounce: 0 kB
        CommitLimit: 5647352 kB
        Committed_AS: 1906168 kB
        VmallocTotal: 34359738367 kB
        VmallocUsed: 47568 kB
        VmallocChunk: 34359690275 kB
        HugePages_Total: 463
        HugePages_Free: 463
        HugePages_Rsvd: 451
        Hugepagesize: 2048 kB

        cell/db version: same as you described in your site

        ip configuration: i have used an static ip for eth1 192.168.56.50

        [root@localhost ~]# ifconfig
        eth0 Link encap:Ethernet HWaddr 08:00:27:14:B8:D1
        inet addr:192.168.1.104 Bcast:192.168.1.255 Mask:255.255.255.0
        inet6 addr: fe80::a00:27ff:fe14:b8d1/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
        RX packets:22807 errors:0 dropped:0 overruns:0 frame:0
        TX packets:14265 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:25594517 (24.4 MiB) TX bytes:2286503 (2.1 MiB)

        eth1 Link encap:Ethernet HWaddr 08:00:27:BA:79:49
        inet addr:192.168.56.50 Bcast:192.168.56.255 Mask:255.255.255.0
        inet6 addr: fe80::a00:27ff:feba:7949/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
        RX packets:1143 errors:0 dropped:0 overruns:0 frame:0
        TX packets:872 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:133956 (130.8 KiB) TX bytes:118559 (115.7 KiB)

        lo Link encap:Local Loopback
        inet addr:127.0.0.1 Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING MTU:16436 Metric:1
        RX packets:22806 errors:0 dropped:0 overruns:0 frame:0
        TX packets:22806 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:10152347 (9.6 MiB) TX bytes:10152347 (9.6 MiB)

        create cell error:

        [celladmin@localhost trace]$ cellcli -e create cell exacell interconnect1=eth1
        Cell exacell successfully created
        Starting CELLSRV services…
        The STARTUP of CELLSRV services was not successful. Error: Start Timed out

        restart cell error:

        [celladmin@localhost trace]$ cellcli -e alter cell restart services all

        Stopping the RS, CELLSRV, and MS services…
        The SHUTDOWN of services was successful.
        Starting the RS, CELLSRV, and MS services…
        Getting the state of RS services… running
        Starting CELLSRV services…
        The STARTUP of CELLSRV services was not successful. Error: Start Timed out
        Starting MS services…
        The STARTUP of MS services was successful.

        output from alert.log file:

        [celladmin@localhost trace]$ tail -50 alert.log
        [RS] Started Service RS_BACKUP with pid 22515
        [RS] Kill previous monitoring process for core RS
        Mon Apr 20 22:57:21 2015
        [RS] Started monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrssmt with pid 22524
        Mon Apr 20 22:57:21 2015
        Successfully setting event parameter –
        Mon Apr 20 22:57:21 2015
        Successfully setting event parameter –
        CELLSRV process id=22517
        CELLSRV cell host name=localhost.localdomain
        CELLSRV version=11.2.3.2.1,label=OSS_11.2.3.2.1_LINUX.X64_130109,Wed_Jan__9_06:09:48_PST_2013
        OS Hugepage status:
        Total/free hugepages available=451/451; hugepage size=2048KB
        OS Stats: Physical memory: 3956 MB. Num cores: 1
        CELLSRV configuration parameters:
        version=0.0
        Physical memory on machine: 3956 MB.
        Memory reserved for cellsrv: 2356 MBMemory for other processes: 1600 MB.
        celldisk policy config read from /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/deploy/config/cdpolicy.dat with ver no. 1 and pol no. 0
        Auto Online Feature 1.3
        CellServer MD5 Binary Checksum: 4701d6c7fd467f39a4f5e0dc2a4370d2
        OS Hugepage status:
        Total/free hugepages available=463/463; hugepage size=2048KB
        MS_ALERT HUGEPAGE CLEAR
        Cache Allocation: Num 1MB hugepage buffers: 900 Num 1MB non-hugepage buffers: 0
        Cache Allocation: BufferSize: 512. Num buffers: 5000. Start Address: 2B83AD753000
        Cache Allocation: BufferSize: 2048. Num buffers: 5000. Start Address: 2B83AD9C5000
        Cache Allocation: BufferSize: 4096. Num buffers: 5000. Start Address: 2B83AE38A000
        Cache Allocation: BufferSize: 8192. Num buffers: 10000. Start Address: 2B83AF713000
        Cache Allocation: BufferSize: 16384. Num buffers: 5000. Start Address: 2B83B4534000
        Cache Allocation: BufferSize: 32768. Num buffers: 5000. Start Address: 2B83B9355000
        Cache Allocation: BufferSize: 65536. Num buffers: 5000. Start Address: 2B83C2F96000
        Cache Allocation: BufferSize: 10485760. Num buffers: 7. Start Address: 2B83D6817000
        CELL communication is configured to use 1 interface(s):
        192.168.56.50
        [RS] Started Service MS with pid 22522
        Mon Apr 20 22:57:32 2015
        IPC version: Oracle UDP/IP (generic)
        IPC Vendor 1 Protocol 2
        Version 4.1
        Mon Apr 20 22:58:01 2015
        [RS] Process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 22516) received exception [signal num: 14] [ADDR:0x0]
        Mon Apr 20 22:58:01 2015
        Mon Apr 20 22:58:01 2015State dump completed for Cellsrv
        Mon Apr 20 22:58:01 2015
        State dump signal delivered to Cellsrv by RS.
        Mon Apr 20 22:58:06 2015
        State dump interrupted for Cellsrv by RS. It did not complete in 5 seconds.
        Clean shutdown signal delivered to OSS
        [RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124

  16. Sorry, I didn’t see that before.
    I have no more simulator up and running, then I cannot check at tis moment network configuration.
    But it sound strange to me that
    >>CELLSRV cell host name=localhost.localdomain
    probably this is translated in 127.0.0.1 or 192.168.1.104.
    You should probably configure host name according with eth1 …
    I’m not sure, I will check as soon as I can get & run the VMs.

    • Hi,

      Thanks for your suggestion it worked. I was able to start up the cellsrv. But now the problem is it won’t work after i restart. So if i need to restart the vm i need to reinstall it otherwise i get the same error. firewall is disable. Can you provide any clue?

      thanks
      Omar

      • Can you send us your network configuration?
        Hosts, eth? Config files under /etc/… , output of commands like hostname, ifconfig -a, …
        Do you see errors in logs? Is your Ram enough?

  17. Thanks a lot for putting up a wonderful page.

    Quick question regarding the step that you mentioned in your comments regarding the creation of Flash Cache where you mentioned that “FLASH uppercase in link name.”
    Can you please highlight in detail what you meant there, and how to create flash cache. You help is appreciated.

    Mike

    • Hi Mike,
      If I well remember was related to file names for simulating all cell disks, flash and not.
      Cellsrv expects to find there a symbolic link to the real device (that we don’t have!). That symbolic link for us is directly a file that will be used as device.
      By experimental way we find that if you use upper case FLASH in the file name that file will be considered as a flash disk.
      Having flash disks (or something like that 😉 ) flash cache can be Configured, automatically by cellsrv o by commands.

  18. Thanks, it worked like a charm
    I am able to finally create flashcache / flashclog using the tip that you provided.

  19. great article.
    I am getting the same error that omar reported, but haven’t see how omar/you resolved this issue.
    appreciate you update on this.

    [celladmin@exacell trace]$ cellcli -e create cell exacell interconnect1=eth1
    Cell exacell successfully created
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful. Error: Start Timed out

    [celladmin@exacell trace]$ tail -f alert*
    Tue Aug 25 12:12:50 2015
    [RS] Process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 9348) received exception [signal num: 14] [ADDR:0x0]
    Tue Aug 25 12:12:50 2015
    Tue Aug 25 12:12:50 2015State dump completed for Cellsrv
    Tue Aug 25 12:12:50 2015
    State dump signal delivered to Cellsrv by RS.
    Tue Aug 25 12:12:55 2015
    State dump interrupted for Cellsrv by RS. It did not complete in 5 seconds.
    Clean shutdown signal delivered to OSS
    [RS] monitoring process /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/cellsrv/bin/cellrsomt (pid: 0) returned with error: 124

    [celladmin@exacell trace]$ ifconfig
    eth0 Link encap:Ethernet HWaddr 08:00:27:FA:FF:E5
    inet addr:192.168.56.199 Bcast:192.168.56.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:356 errors:0 dropped:1 overruns:0 frame:0
    TX packets:229 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:36370 (35.5 KiB) TX bytes:32071 (31.3 KiB)

    eth1 Link encap:Ethernet HWaddr 08:00:27:71:A7:9E
    inet addr:192.168.56.50 Bcast:192.168.56.255 Mask:255.255.255.0
    UP BROADCAST MULTICAST MTU:1500 Metric:1
    RX packets:21 errors:0 dropped:0 overruns:0 frame:0
    TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:1304 (1.2 KiB) TX bytes:462 (462.0 b)

    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    UP LOOPBACK RUNNING MTU:16436 Metric:1
    RX packets:58998 errors:0 dropped:0 overruns:0 frame:0
    TX packets:58998 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:10518775 (10.0 MiB) TX bytes:10518775 (10.0 MiB)

    [celladmin@exacell trace]$ cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    ::1 localhost6.localdomain6 localhost6
    192.168.56.199 exacell.localhost.com exacell
    [celladmin@exacell trace]$ hostname
    exacell.localhost.com

    [celladmin@exacell trace]$ cat /proc/meminfo
    MemTotal: 6074532 kB
    MemFree: 594384 kB
    Buffers: 2672 kB
    Cached: 67376 kB
    SwapCached: 8220 kB
    Active: 1671668 kB
    Inactive: 507560 kB
    Active(anon): 1644512 kB
    Inactive(anon): 468640 kB
    Active(file): 27156 kB
    Inactive(file): 38920 kB
    Unevictable: 20104 kB
    Mlocked: 7840 kB
    SwapTotal: 6094844 kB
    SwapFree: 6060300 kB
    Dirty: 336 kB
    Writeback: 0 kB
    AnonPages: 2121356 kB
    Mapped: 37240 kB
    Shmem: 1804 kB
    Slab: 69792 kB
    SReclaimable: 28164 kB
    SUnreclaim: 41628 kB
    KernelStack: 3240 kB
    PageTables: 29172 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 7588940 kB
    Committed_AS: 4708508 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 72944 kB
    VmallocChunk: 34359654392 kB
    HardwareCorrupted: 0 kB
    AnonHugePages: 1351680 kB
    HugePages_Total: 1507
    HugePages_Free: 1507
    HugePages_Rsvd: 1501
    HugePages_Surp: 0
    Hugepagesize: 2048 kB
    DirectMap4k: 10176 kB
    DirectMap2M: 6232064 kB

    • You point to eth1, with ip …50 but in your host the name is related to …199

      I cannot check because the simulator was created to study and then destroyed, but the issue should be config related.

  20. I am facing below ORA-600 issues while installing Database in Exadata. Could you please help to resolve the issues.

    celladmin@cell1 cell11.2.3.3.0_LINUX.X64_131014.1]$ cellcli -e list alerthistory
    1 2015-09-12T15:21:17+05:30 critical “RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []”
    2 2015-09-12T15:32:07+05:30 critical “RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []”
    3 2015-09-12T15:43:08+05:30 critical “RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []”
    4 2015-09-12T15:51:38+05:30 critical “RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []”
    5 2015-09-12T16:02:47+05:30 critical “RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []”
    6 2015-09-12T17:06:18+05:30 critical “ORA-00600: internal error code, arguments: [StorageIdx::getOclSIRegion], [], [], [], [], [], [], [], [], [], [], []”

  21. Hi

    I am having problem creating cell, getting this error in alert.log as (below), could you please help.

    OS: OEL 5.10 (Linux cell1.test 2.6.39-400.264.5.el5uek #1 )
    Cell Software: cell11.2.3.3.1

    [RS] Started monitoring process /opt/oracle/cell11.2.3.3.1_LINUX.X64_140708/cellsrv/bin/cellrsomt with pid 6971
    Mon Dec 14 10:34:54 2015
    Successfully setting event parameter –
    Mon Dec 14 10:34:54 2015
    Successfully setting event parameter –
    CELLSRV process id=6972
    CELLSRV cell host name=cell1.test
    CELLSRV version=11.2.3.3.1,label=OSS_11.2.3.3.1_LINUX.X64_140708,Tue_Jul__8_04:01:56_PDT_2014
    CELLSRV version md5: 32ea01399bfb4c21a7a732e1946701c3
    OS Stats: Physical memory: 5838 MB. Num cores: 2
    OS Hugepage status:
    Total/free hugepages available=1513/1501; hugepage size=2048KB
    CELLSRV configuration parameters:
    version=0.0
    Physical memory on machine: 5838 MB.
    Memory reserved for cellsrv: 4238 MBMemory for other processes: 1600 MB.
    Running on simulated hardware in production environment
    celldisk policy config read from /opt/oracle/cell11.2.3.3.1_LINUX.X64_140708/cellsrv/deploy/config/cdpolicy.dat with ver no. 2 and pol no. 0
    Auto Online Feature 1.6
    OS Hugepage status:
    Total/free hugepages available=1525/1513; hugepage size=2048KB
    MS_ALERT HUGEPAGE CLEAR
    Cache Allocation: Num 1MB hugepage buffers: 3000 Num 1MB non-hugepage buffers: 0
    Cache Allocation: BufferSize: 512. Num buffers: 5000. Start Address: 7F317A288000
    Cache Allocation: BufferSize: 2048. Num buffers: 5000. Start Address: 7F317A4FA000
    Cache Allocation: BufferSize: 4096. Num buffers: 5000. Start Address: 7F317AEBF000
    Cache Allocation: BufferSize: 8192. Num buffers: 10000. Start Address: 7F317C248000
    Cache Allocation: BufferSize: 16384. Num buffers: 5000. Start Address: 7F3181069000
    Cache Allocation: BufferSize: 32768. Num buffers: 5000. Start Address: 7F3185E8A000
    Cache Allocation: BufferSize: 65536. Num buffers: 5000. Start Address: 7F318FACB000
    Cache Allocation: BufferSize: 10485760. Num buffers: 7. Start Address: 7F31A334C000
    CELL communication is configured to use 1 interface(s):
    192.168.3.100
    CELL IP affinity details:
    NUMA status: non-NUMA system
    cellaffinity.ora status: N/A
    CELL communication will use 1 IP group(s):
    Grp 0: *192.168.3.100
    Mon Dec 14 10:35:04 2015
    IPC version: Oracle UDP/IP (generic)
    IPC Vendor 1 Protocol 2
    Version 4.1
    Mon Dec 14 10:35:34 2015
    [RS] Process /opt/oracle/cell11.2.3.3.1_LINUX.X64_140708/cellsrv/bin/cellrsomt (pid: 6971) received exception [signal num: 14] [ADDR:0x0]
    Mon Dec 14 10:35:34 2015
    Mon Dec 14 10:35:34 2015 70 msec State dump completed for CELLSRV
    Mon Dec 14 10:35:34 2015
    State dump signal delivered to Cellsrv by RS.
    Mon Dec 14 10:35:39 2015
    State dump interrupted for Cellsrv by RS. It did not complete in 5 seconds.
    Clean shutdown signal delivered to CELLSRV by pid – 4438, tid – 0
    [RS] monitoring process /opt/oracle/cell11.2.3.3.1_LINUX.X64_140708/cellsrv/bin/cellrsomt (pid: 6971) returned with error: 124

    Regards
    Amit

  22. Hi All,

    First of Thanks for creating such great document .Able to create successful cell storage till this step.

    I had face same issue which other face :-
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful. Error: Start Timed out

    Issue was ETH1 IP starting value and value differ in /etc/hosts . so, have added another entry with ETH1 value and reslove name as stocell1.localhost.com stocell1 + network service restart. Still got error after that.

    So, when just change the value in /etc/hosts as stocell1.localhost.com stocell11 and network service restart + VMWARE Machine restart and after retry with cellcli -e create cell stocell1 interconnect1=eth1 . it’s boom!!!! Worked!!! after that all above command works and all are up.

    I am not able to understand what is real issue with stocell1 name ?

    Below is my current storage cell config:-(can you please say all are config correct or this is wrong ? i am confused here).

    CellCLI> list cell detail
    name: stocell11
    bbuTempThreshold: 60
    bbuChargeThreshold: 800
    bmcType: absent
    cellVersion: OSS_11.2.3.2.1_LINUX.X64_130109
    cpuCount: 1
    diagHistoryDays: 7
    fanCount: 1/1
    fanStatus: normal
    flashCacheMode: WriteThrough
    id: c9938562-c4f9-4bcd-8866-cd07323a3bf4
    interconnectCount: 2
    interconnect1: eth1
    iormBoost: 0.0
    ipaddress1: 192.168.19.129/24
    kernelVersion: 2.6.32-300.10.1.el5uek
    makeModel: Fake hardware
    metricHistoryDays: 7
    offloadEfficiency: 1.0
    powerCount: 1/1
    powerStatus: normal
    releaseVersion: 11.2.3.2.1
    releaseTrackingBug: 14522699
    status: online
    temperatureReading: 0.0
    temperatureStatus: normal
    upTime: 0 days, 1:06
    cellsrvStatus: running
    msStatus: running
    rsStatus: running

    CellCLI> list celldisk
    CD_01_stocell11 normal
    CD_02_stocell11 normal
    CD_03_stocell11 normal
    CD_04_stocell11 normal
    CD_05_stocell11 normal
    CD_06_stocell11 normal
    CD_07_stocell11 normal
    CD_09_stocell11 normal
    CD_10_stocell11 normal
    CD_11_stocell11 normal
    CD_12_stocell11 normal
    CD_13_stocell11 normal
    CD_14_stocell11 normal
    CD_15_stocell11 normal
    CD_16_stocell11 normal
    CD_17_stocell11 normal
    CD_18_stocell11 normal
    CD_19_stocell11 normal

    CellCLI> list griddisk
    DATA_CD_01_stocell11 active
    DATA_CD_02_stocell11 active
    DATA_CD_03_stocell11 active
    DATA_CD_04_stocell11 active
    DATA_CD_05_stocell11 active
    DATA_CD_06_stocell11 active
    DATA_CD_07_stocell11 active
    DATA_CD_09_stocell11 active
    DATA_CD_10_stocell11 active
    DATA_CD_11_stocell11 active
    DATA_CD_12_stocell11 active
    DATA_CD_13_stocell11 active

    Thanks
    Deep

  23. Hello ,
    I see the error “ORA-00600: internal error code, arguments: [LinuxBlockIO::init]” in the blog, i was able to solve the error by adding the below entry in cellinit.ora

    _cellrsdef_heartbeat_timeout=10

    Thanks,
    Krish

  24. Hi,
    Thanks for the excellent post. I tried to setup in my lab and running into following errors
    [celladmin@stocell1 ~]$ cellcli -e alter cell restart services all

    Stopping the RS, CELLSRV, and MS services…
    The SHUTDOWN of services was successful.
    Starting the RS, CELLSRV, and MS services…
    Getting the state of RS services… running
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful.
    CELL-01531: Unable to parse the cellinit.ora file due to incorrect parameters in the file.
    Starting MS services…
    The STARTUP of MS services was not successful.
    CELL-01531: Unable to parse the cellinit.ora file due to incorrect parameters in the file.
    [celladmin@stocell1 ~]$

    Note : My setup environment is as below
    a) VMware Pro 12
    b) Oracle Linux 7
    c) Installed cell-12.1.2.3.2_LINUX.X64_160721-1.x86_64.rpm
    d) Installed jdk1.8.0_66-1.8.0_66-fcs.x86_64.rpm

    cellinit.ora is 0 bytes.
    [celladmin@stocell1 config]$ cat /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/cellsrv/deploy/config/cellinit.ora
    [celladmin@stocell1 config]$ ls -l /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/cellsrv/deploy/config/cellinit.ora
    -rw-r–r–. 1 celladmin root 0 Jan 11 12:06 /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/cellsrv/deploy/config/cellinit.ora

    [celladmin@stocell1 config]$ cellcli -e create cell stocell1 interconnect1=eno33554984

    CELL-01514: Connect Error. Verify that Management Server is listening at the specified HTTP port: 8888.

    [root@stocell1 ]# celld status
    rsStatus: running
    msStatus: stopped
    cellsrvStatus: stopped

    Please advice

    • I was able to proceed 1 step further by adding PATH to lib folder. Now I am stuck at next command
      [celladmin@stocell1 ~]$ cellcli -e alter cell restart services all

      Stopping the RS, CELLSRV, and MS services…
      The SHUTDOWN of services was successful.
      Starting the RS, CELLSRV, and MS services…
      Getting the state of RS services… running
      Starting CELLSRV services…
      The STARTUP of CELLSRV services was not successful.
      CELL-01531: Unable to parse the cellinit.ora file due to incorrect parameters in the file.
      Starting MS services…
      The STARTUP of MS services was successful.

      [celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1

      CELL-02598: Ipaddress/Netmask attribute is not properly configured for interconnect eth1.

      ———————————————–
      [celladmin@stocell1 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-eth1
      TYPE=Ethernet
      BOOTPROTO=none
      DEFROUTE=yes
      IPV4_FAILURE_FATAL=no
      IPV6INIT=no
      NAME=eth1
      UUID=808d6f42-ab25-4017-b9c4-0a52dc9a42db
      DEVICE=eth1
      ONBOOT=yes
      DNS1=192.168.116.2
      IPADDR=192.168.116.161
      GATEWAY=192.168.116.2
      —————————————————
      [celladmin@stocell1 ~]$ netstat -rn
      Kernel IP routing table
      Destination Gateway Genmask Flags MSS Window irtt Iface
      0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth0
      0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth0
      0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth1
      192.168.116.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
      192.168.116.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
      192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
      —————————————————
      Please advice

      • Hi, I deleted the simulator just after the exam and I cannot check on “my” configuration.
        Try setting explicitly NETMASK=255.255.255.0 in eth1 and also post your /etc/hosts
        Ciao

  25. Thanks for suggestion.
    By setting netmask explicitly, I was able to move again 1 step forward but still CELLSRV could not start.

    CellCLI> alter cell restart services all

    Stopping the RS, CELLSRV, and MS services…
    The SHUTDOWN of services was successful.
    Starting the RS, CELLSRV, and MS services…
    Getting the state of RS services… running
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful.
    CELL-01547: CELLSRV startup failed due to unknown reasons.
    Starting MS services…
    The STARTUP of MS services was successful.

    CellCLI> exit
    quitting

    [celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1
    Cell stocell1 successfully created
    Starting CELLSRV services…
    The STARTUP of CELLSRV services was not successful.
    CELL-01547: CELLSRV startup failed due to unknown reasons.

    =======================================================================================
    I see now cellinit.ora also got populated with below entry
    cat /opt/oracle/cell/cellsrv/deploy/config/cellinit.ora
    #CELL Initialization Parameters
    ipaddress1=192.168.116.161/24

    =======================================================================================
    [root@stocell1 trace]# cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    #::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    192.168.116.151 stocell1
    192.168.116.161 stocell1-ib
    =======================================================================================
    [root@stocell1 trace]# netstat -rn
    Kernel IP routing table
    Destination Gateway Genmask Flags MSS Window irtt Iface
    0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth1
    0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth1
    0.0.0.0 192.168.116.2 0.0.0.0 UG 0 0 0 eth0
    192.168.116.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
    192.168.116.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
    192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
    =======================================================================================
    [root@stocell1 trace]# ifconfig -a
    eth0: flags=4163 mtu 1500
    inet 192.168.116.151 netmask 255.255.255.0 broadcast 192.168.116.255
    inet6 fe80::20c:29ff:fe07:2c3c prefixlen 64 scopeid 0x20
    ether 00:0c:29:07:2c:3c txqueuelen 1000 (Ethernet)
    RX packets 67346 bytes 4076442 (3.8 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 16 bytes 1128 (1.1 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eth1: flags=4163 mtu 1500
    inet 192.168.116.161 netmask 255.255.255.0 broadcast 192.168.116.255
    inet6 fe80::20c:29ff:fe07:2c46 prefixlen 64 scopeid 0x20
    ether 00:0c:29:07:2c:46 txqueuelen 1000 (Ethernet)
    RX packets 68830 bytes 4224925 (4.0 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1802 bytes 1386484 (1.3 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    lo: flags=73 mtu 65536
    ……
    …..
    …..
    =======================================================================================
    [root@stocell1 trace]# hostname
    stocell1
    [root@stocell1 trace]# ping 192.168.116.151
    PING 192.168.116.151 (192.168.116.151) 56(84) bytes of data.
    64 bytes from 192.168.116.151: icmp_seq=1 ttl=64 time=0.079 ms
    64 bytes from 192.168.116.151: icmp_seq=2 ttl=64 time=0.030 ms
    ^C
    — 192.168.116.151 ping statistics —
    2 packets transmitted, 2 received, 0% packet loss, time 1000ms
    rtt min/avg/max/mdev = 0.030/0.054/0.079/0.025 ms
    [root@stocell1 trace]# ping 192.168.116.161
    PING 192.168.116.161 (192.168.116.161) 56(84) bytes of data.
    64 bytes from 192.168.116.161: icmp_seq=1 ttl=64 time=0.029 ms
    64 bytes from 192.168.116.161: icmp_seq=2 ttl=64 time=0.053 ms
    64 bytes from 192.168.116.161: icmp_seq=3 ttl=64 time=0.055 ms
    ^C
    — 192.168.116.161 ping statistics —
    3 packets transmitted, 3 received, 0% packet loss, time 1999ms
    rtt min/avg/max/mdev = 0.029/0.045/0.055/0.014 ms
    =======================================================================================

    Please advice.

    • Latest Errors :
      [celladmin@stocell1 ~]$ cellcli -e create cell stocell1 interconnect1=eth1
      Cell stocell1 successfully created
      Starting CELLSRV services…
      The STARTUP of CELLSRV services was not successful.
      CELL-01547: CELLSRV startup failed due to unknown reasons.

      Alert Log
      ————
      CELL process id=12832
      CELL host name=stocell1
      CELL version=12.1.2.3.2,label=OSS_12.1.2.3.2_LINUX.X64_160721,Thu_Jul_21_09:50:44_PDT_2016
      CELLSRV version md5: 150f7acd0b05d50095223fb9399e36ee
      OS Stats: Physical memory: 5792 MB. Num cores: 1
      CELLSRV configuration parameters:
      Memory reserved for cellsrv: 2892 MB Memory for other processes: 2900 MB
      Running on simulated hardware in production environment
      Successfully allocated 256 MB for Storage Index. Storage Index memory usage can grow up to a maximum of 289 MB.
      CELL communication is configured to use 1 interface(s):
      192.168.116.161
      IPC version: Oracle UDP/IP (generic)
      IPC Vendor 1 Protocol 2
      Version 4.1
      MS_ALERT HUGEPAGE CLEAR
      ossmmap_map: mmap failed for SparseV2PhysMap len: 12800 as there is insufficient memory
      Dumping oal memory statistics (all values in bytes)
      cellsrv: total os mem: 6012474792 sga osmem: 1375731712 pga osmem: 1086888
      cellsrv: sga alloc mem: 1145246520 pga alloc mem: 510120
      group: total os mem: 0 ocl: 3145728
      Memtype: sga: cellsrv os mem 1375731712 all group os mem 0
      Memtype: pga: cellsrv os mem 1086888 all group os mem 0
      Memtype: cache: cellsrv os mem 3962249216 all group os mem 0
      Memtype: storidx: cellsrv os mem 289431552 all group os mem 0
      Memtype: heapsummary: cellsrv os mem 18022400 all group os mem 0
      Memtype: codetext: cellsrv os mem 78643200 all group os mem 0
      Memtype: malloc: cellsrv os mem 33554432 all group os mem 0
      Memtype: stack: cellsrv os mem 253755392 all group os mem 0
      Thu Jan 12 13:12:54 2017
      [RS] monitoring process /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/cellsrv/bin/cellrsomt (pid: 12830) returned with error: 161
      Errors in file /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/log/diag/asm/cell/stocell1/trace/svtrc_12832_main.trc (incident=321):
      ORA-00600: internal error code, arguments: [TODO(zutao): handle OOM gracefully], [], [], [], [], [], [], [], [], [], [], []
      Incident details in: /opt/oracle/cell12.1.2.3.2_LINUX.X64_160721/log/diag/asm/cell/stocell1/incident/incdir_321/svtrc_12832_main_i321.trc
      Sweep [inc][321]: completed
      CELLSRV error – ORA-600 internal error
      Thu Jan 12 13:12:55 2017
      CELLSRV is no longer alive before state dump completes.
      Thu Jan 12 13:12:55 2017
      [RS] Stopped Service CELLSRV

      Looks like memory related error. Not sure where to adjust.

      • Thanks I will able to proceed by increasing RAM of my machine. Thanks for all your help.

  26. Hi, Thanks a lot for sharing the info. I am able to create cell storage successfully, I want to know do i need to install exadata software in db node also. I configure 2 db node and 3 cell node, while running root.sh in node1 its failed and not able to
    Adding Clusterware entries to inittab
    CRS-2672: Attempting to start ‘ora.mdnsd’ on ‘qr01db01’
    CRS-2676: Start of ‘ora.mdnsd’ on ‘qr01db01’ succeeded
    CRS-2672: Attempting to start ‘ora.gpnpd’ on ‘qr01db01’
    CRS-2676: Start of ‘ora.gpnpd’ on ‘qr01db01’ succeeded
    CRS-2672: Attempting to start ‘ora.cssdmonitor’ on ‘qr01db01’
    CRS-2672: Attempting to start ‘ora.gipcd’ on ‘qr01db01’
    CRS-2676: Start of ‘ora.cssdmonitor’ on ‘qr01db01’ succeeded
    CRS-2676: Start of ‘ora.gipcd’ on ‘qr01db01’ succeeded
    CRS-2672: Attempting to start ‘ora.cssd’ on ‘qr01db01’
    CRS-2672: Attempting to start ‘ora.diskmon’ on ‘qr01db01’
    CRS-2676: Start of ‘ora.diskmon’ on ‘qr01db01’ succeeded
    CRS-2674: Start of ‘ora.cssd’ on ‘qr01db01’ failed
    CRS-2679: Attempting to clean ‘ora.cssd’ on ‘qr01db01’
    CRS-2681: Clean of ‘ora.cssd’ on ‘qr01db01’ succeeded
    CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘qr01db01’
    CRS-2677: Stop of ‘ora.gipcd’ on ‘qr01db01’ succeeded
    CRS-2673: Attempting to stop ‘ora.cssdmonitor’ on ‘qr01db01’
    CRS-2677: Stop of ‘ora.cssdmonitor’ on ‘qr01db01’ succeeded
    CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘qr01db01’
    CRS-2677: Stop of ‘ora.gpnpd’ on ‘qr01db01’ succeeded
    CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘qr01db01’
    CRS-2677: Stop of ‘ora.mdnsd’ on ‘qr01db01’ succeeded
    CRS-4000: Command Start failed, or completed with errors.
    CSS startup failed with return code 1
    The exlusive mode cluster start failed, see Grid Infrastructure alert log for more information
    Initial cluster configuration failed. See /oraeng/GI/cfgtoollogs/crsconfig/rootcrs_qr01db01.log for details
    /oraeng/GI/perl/bin/perl -I/oraeng/GI/perl/lib -I/oraeng/GI/crs/install /oraeng/GI/crs/install/rootcrs.pl execution failed
    [root@qr01db01 ~]# ifconfig

    in ocsd logfile showing error.

    71: [ CSSD][1080420672]clssgmDeadProc: proc 0x2a81e50
    2017-10-07 09:53:52.185: [ CSSD][1080420672]clssgmDestroyProc: cleaning up proc(0x2a81e50) con(0x1b0) skgpid ospid 10755 with 0 clients, refcount 0
    2017-10-07 09:53:52.186: [ CSSD][1080420672]clssgmDiscEndpcl: gipcDestroy 0x1b0
    2017-10-07 09:53:52.226: [ CSSD][1080420672]clssscSelect: cookie accept request 0x26a2c10
    2017-10-07 09:53:52.226: [ CSSD][1080420672]clssgmAllocProc: (0x2a81ef0) allocated
    2017-10-07 09:53:52.226: [ CSSD][1080420672]clssgmClientConnectMsg: properties of cmProc 0x2a81ef0 – 1,2,3,4,5
    2017-10-07 09:53:52.226: [ CSSD][1080420672]clssgmClientConnectMsg: Connect from con(0x200) proc(0x2a81ef0) pid(10755) version 11:2:1:4, properties: 1,2,3,4,5
    2017-10-07 09:53:52.226: [ CSSD][1080420672]clssgmClientConnectMsg: msg flags 0x0000
    2017-10-07 09:53:54.257: [ SKGFD][1099741504]ERROR: -8(OS Error 1 (bind_fail,skgxpvifconf,requested interface 10.0.0.50 failed bind. Check output from ifconfig command,Error 0)
    )
    2017-10-07 09:53:54.257: [ SKGFD][1099741504]ERROR: -10(OSS Operation oss_initialize failed with error 4 [Network initialization failed]
    )
    2017-10-07 09:53:54.258: [ CSSD][1099741504]clsssnmvDDiscThread: Unable to create clsf context
    2017-10-07 09:53:54.258: [ CSSD][1099741504]###################################
    2017-10-07 09:53:54.258: [ CSSD][1099741504]clssscExit: CSSD aborting from thread clssnmvDDiscThread
    2017-10-07 09:53:54.258: [ CSSD][1099741504]###################################
    2017-10-07 09:53:54.258: [ CSSD][1099741504](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
    2017-10-07 09:53:54.258: [ CSSD][1099741504]

    Though i checked eth interface is up and running and able to ping from both node

    Please help me on this.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.