Engineers working in Software maintenance know, how the life would look like, if they got a core in site, without much debug symbols enabled. The least possible information that can be obtained from release mode corefile is the function name and function offset, the coring instruction.
There is now way for the developer to really understand, what c / c++ statement has triggered the core.
In case of customer escalations, the first and foremost thing, every developer should do is to find the RC and update the end customer that, “we are working on it ” . In order to get the RC, one should understand, what had happened and where exactly the core has occurred, w.r.t c / c++.
Well, at least in Solaris, we are not left alone.
Solaris provides a wonderful tool named er_src
er_src - print source or dissasembly with index lines and interleaved compiler commentary
More information can be obtained from http://docs.oracle.com/cd/E19205-01/820-4180/man1/er_src.1.html
Lets take an example, to understand the power of er_src.
Consider a code snippet as follows.
using namespace std; int getSquare(int *pnum); int main() { int num, *pnum=NULL; cout << "This function generates square" << endl; cout << "Enter the number which you want to find the square" << endl; cin >> num; cout << "The square is " << getSquare(pnum) << endl; return 0; } int getSquare(int *pnum) { cout << "Entering getSquare" << endl; int num = *pnum; return (num * num); }
As you can probably guess, there would be a segmentation fault in the line 18 [ the pointer is still NULL]. Assume this executable is in release mode and there is a core in customer site. When you get the core from site and try to open it, all you would get is this..
$dbx test core For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.6' in your .dbxrc Reading test core file header read successfully Reading ld.so.1 Reading libCstd.so.1 Reading libCrun.so.1 Reading libm.so.2 Reading libc.so.1 Reading libCstd_isa.so.1 Reading libc_psr.so.1 WARNING!! A loadobject was found with an unexpected checksum value. See `help core mismatch' for details, and run `proc -map' to see what checksum values were expected and found. dbx: warning: Some symbolic information might be incorrect. program terminated by signal SEGV (no mapping at the fault address) 0x000111ec: getSquare+0x002c: ld [%i0], %i5
At this point, you cannot make sense with what is meant by
0x000111ec: getSquare+0x002c: ld [%i0], %i5
Here is where, er_src gives a helping hand.
If you look closely, you can understand that, some instruction which is at offset 2c from getSquare has caused the process to core.
Now you make a debug executable of the same process and run the er_src against it, as follows.
$er_src -disasm all -1 test > disasm.txt
Now look the resulting disasm.txt
82 16. int getSquare(int *pnum) 83 <Function: getSquare(int*)> 84 [ 16] 11278: save %sp, -104, %sp 85 [ 16] 1127c: st %i0, [%fp + 68] 86 17. { 87 18. cout << "Entering getSquare" << endl; 88 [ 18] 11280: sethi %hi(0x21800), %l0 89 [ 18] 11284: bset 0, %l0 ! 0x21800 90 [ 18] 11288: sethi %hi(0x11400), %l1 91 [ 18] 1128c: bset 385, %l1 ! 0x11581 92 [ 18] 11290: or %l0, %g0, %o0 93 [ 18] 11294: call std::operator<<(std::basic_ostream<char,std::char_traits<char> >&,const char* ) ! 0x215ec 94 [ 18] 11298: or %l1, %g0, %o1 95 [ 18] 1129c: sethi %hi(0x11000), %l0 96 [ 18] 112a0: bset 840, %l0 ! 0x11348 97 [ 18] 112a4: call std::basic_ostream<char,std::char_traits<char> >::operator<<(std::basic_ostre am<char,std::char_traits<char> >&(*)(std::basic_ostream<char,std::char_traits<char> >&)) ! 0x215f8 98 [ 18] 112a8: or %l0, %g0, %o1 99 19. int num = *pnum; 100 [ 19] 112ac: ld [%fp + 68], %l0 101 [ 19] 112b0: ld [%l0], %l0 102 [ 19] 112b4: st %l0, [%fp - 8] 103 20. return (num * num); 104 [ 20] 112b8: ld [%fp - 8], %l0 105 [ 20] 112bc: smul %l0, %l0, %l0 106 [ 20] 112c0: st %l0, [%fp - 4] 107 21. } 108 [ 21] 112c4: ld [%fp - 4], %l0 109 [ 21] 112c8: or %l0, %g0, %i0 110 [ 21] 112cc: ret
The disassemble for the function getSquare has started @ address 1127c. And we know from the core that, the coring instruction was placed at offset 2c from the getSquare.
So 1127c + 2c = 112a8. Aha… we now got the exact coring instruction… that too in the source.
[ 18] 112a8: or %l0, %g0, %o1 19. int num = *pnum;
Ofcourse, its not a exact match, due to extra offsets in debug mode. But nevertheless it still helps to narrow down.
Hope now you understood the power of er_src.
Happy Coding and Debugging !