Tags

, , , , , ,

Engineers working in Software maintenance know, how the life would look like, if they got a core in site, without much debug symbols enabled. The least possible information that can be obtained from release mode corefile is the function name and function offset, the coring instruction.

There is now way for the developer to really understand, what c / c++ statement has triggered the core.

In case of customer escalations, the first and foremost thing, every developer should do is to find the RC and update the end customer that, “we are working on it ” . In order to get the RC, one should understand, what had happened and where exactly the core has occurred, w.r.t c / c++.

Well, at least in Solaris, we are not left alone.

Solaris provides a wonderful tool  named er_src

er_src - print source or dissasembly with  index  lines  and
     interleaved compiler commentary

More information can be obtained from http://docs.oracle.com/cd/E19205-01/820-4180/man1/er_src.1.html

Lets take an example, to understand the power of er_src.

Consider a code snippet as follows.

using namespace std;

int getSquare(int *pnum);

int main()
{
int num, *pnum=NULL;
cout << "This function generates square" << endl;
cout << "Enter the number which you want to find the square" << endl;
cin >> num;
cout << "The square is " << getSquare(pnum) << endl;
return 0;
}

int getSquare(int *pnum)
{
cout << "Entering getSquare" << endl;
int num = *pnum;
return (num * num);
}

As you can probably guess, there would be a segmentation fault in the line 18 [ the pointer is still NULL]. Assume this executable is in release mode and there is a core in customer site. When you get the core from site and try to open it, all you would get is this..


$dbx test core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6' in your .dbxrc
Reading test
core file header read successfully
Reading ld.so.1
Reading libCstd.so.1
Reading libCrun.so.1
Reading libm.so.2
Reading libc.so.1
Reading libCstd_isa.so.1
Reading libc_psr.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
program terminated by signal SEGV (no mapping at the fault address)
0x000111ec: getSquare+0x002c:   ld       [%i0], %i5

At this point, you cannot make sense with what is meant by


0x000111ec: getSquare+0x002c:   ld       [%i0], %i5

Here is where, er_src gives a helping hand.
If you look closely, you can understand that, some instruction which is at offset 2c from getSquare has caused the process to core.
Now you make a debug executable of the same process and run the er_src against it, as follows.


$er_src -disasm all -1 test > disasm.txt

Now look the resulting disasm.txt


82             16. int getSquare(int *pnum)
83                 <Function: getSquare(int*)>
84                 [ 16]    11278:  save        %sp, -104, %sp
85                 [ 16]    1127c:  st          %i0, [%fp + 68]
86             17. {
87             18. cout << "Entering getSquare" << endl;
88                 [ 18]    11280:  sethi       %hi(0x21800), %l0
89                 [ 18]    11284:  bset        0, %l0 ! 0x21800
90                 [ 18]    11288:  sethi       %hi(0x11400), %l1
91                 [ 18]    1128c:  bset        385, %l1 ! 0x11581
92                 [ 18]    11290:  or          %l0, %g0, %o0
93                 [ 18]    11294:  call        std::operator<<(std::basic_ostream<char,std::char_traits<char> >&,const char*    ) ! 0x215ec
94                 [ 18]    11298:  or          %l1, %g0, %o1
95                 [ 18]    1129c:  sethi       %hi(0x11000), %l0
96                 [ 18]    112a0:  bset        840, %l0 ! 0x11348
97                 [ 18]    112a4:  call        std::basic_ostream<char,std::char_traits<char> >::operator<<(std::basic_ostre    am<char,std::char_traits<char> >&(*)(std::basic_ostream<char,std::char_traits<char> >&)) ! 0x215f8
98                 [ 18]    112a8:  or          %l0, %g0, %o1
99             19. int num = *pnum;
100                 [ 19]    112ac:  ld          [%fp + 68], %l0
101                 [ 19]    112b0:  ld          [%l0], %l0
102                 [ 19]    112b4:  st          %l0, [%fp - 8]
103             20. return (num * num);
104                 [ 20]    112b8:  ld          [%fp - 8], %l0
105                 [ 20]    112bc:  smul        %l0, %l0, %l0
106                 [ 20]    112c0:  st          %l0, [%fp - 4]
107             21. }
108                 [ 21]    112c4:  ld          [%fp - 4], %l0
109                 [ 21]    112c8:  or          %l0, %g0, %i0
110                 [ 21]    112cc:  ret

The disassemble for the function getSquare has started @ address 1127c. And we know from the core that, the coring instruction was placed at offset 2c from the getSquare.

So 1127c + 2c = 112a8. Aha… we now got the exact coring instruction… that too in the source.


[ 18]    112a8:  or          %l0, %g0, %o1
19. int num = *pnum;

Ofcourse, its not a exact match, due to extra offsets in debug mode. But nevertheless it still helps to narrow down.

Hope now you understood the power of er_src.

Happy Coding and Debugging !