Asterisk is hanging (crashing?) and I can't stop it - corrupt database?

Discussion in 'Help' started by luckman212, Dec 29, 2010.

  1. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Hi guys. I've got a working PiaF purple installed on a little AspireRevo here and when it works, it works great. I've really enjoyed tinkering with it the past few weeks and learning the ins & outs. Unfortunately it was a while before I learned that it was important to issue an amportal stop command before rebooting the server. :eek: So I happily rebooted the thing a handful of times while asterisk was running. Now I don't know if I've corrupted my database by doing so.

    The problem I'm having now is, every so often (varies from several hours to several days...) the * server will just go dead. It still shows up in ps ax | grep asterisk but I can't make or receive any calls. I can connect to the * console (CLI) during this time and it will show some SIP channels active that should have died long ago. Issuing amportal stop at this point results in this:
    [​IMG]
    it just hangs there for 2-3 minutes, and then the command will eventually time out and tell me that * is stopped (which it isn't).

    So at that point I am not sure what to do-- I usually just connect to the console and issue core stop now and then reboot the box.

    What I want to know is, how can I debug what's going on "under the hood" here, and hopefully fix it. I'd like to know what * is really doing - i.e. is there a thread that it's stuck on etc, or any kind of logfile or db I can query to find out why this thing keeps getting hung up? Is there any way to examine the databases to make sure they are intact?
     
  2. jmullinix

    jmullinix Guru

    Joined:
    Oct 21, 2007
    Messages:
    1,263
    Likes Received:
    7
    Lack of DNS to Asterisk can cause this. Did you have a DNS hick-up while this was happening.
     
  3. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Thanks. Definitely not a DNS failure. I have 2 local DNS servers here and they are both monitored, no outages & everything humming along just fine. In any case, how would I even tell (by looking at some certain log, or CLI output) that this was the issue? Seems crazy that one DNS lookup failure could bring down my whole asterisk server??
     
  4. blanchae

    blanchae Guru

    Joined:
    Mar 12, 2008
    Messages:
    1,910
    Likes Received:
    9
    There's your problem - purple is experimental.
     
  5. jmullinix

    jmullinix Guru

    Joined:
    Oct 21, 2007
    Messages:
    1,263
    Likes Received:
    7
    Blanchae:

    I don't totally agree. I have Asterisk 1.8.1.1 running stably on Ubuntu Lucid Lynx. I am working with a member of this forum that is running PIAF purple in production and it is working fairly well. There are some little bugs, but not one that shuts Asterisk off.

    Luckman:

    It has been a long known bug in Asterisk's Sip stack that causes Asterisk to stop processing all calls if it looses DNS. This bug has been around since I have been installing Asterisk. That is why I asked about it.
     
  6. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    So if it were a DNS issue, would there be any way to confirm that this is what really happened? Some kind of log entry, etc? In general how can I find out what is making asterisk go "zombie"?
     
  7. Stewart

    Stewart Guru

    Joined:
    Sep 16, 2009
    Messages:
    604
    Likes Received:
    6
    I've been able to get around the issue with DNS (mostly) by using DNSmasq. I say mostly becuase it still needs a good connection to begin with so that it can cache, but then if I lose connection it still works fine because the queries are still resolving.
     
  8. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Right- I am already using DNSmasq locally. Hmm. Again I was wondering if there is any way to peek under the hood at what asterisk is doing/was most recently doing/waiting for/hung up on so as to further debug this problem.
     
  9. phonebuff

    phonebuff Guru

    Joined:
    Feb 7, 2008
    Messages:
    874
    Likes Received:
    57
    Your peek under the hood will depend on logging levels..

    Look at /var/log/asterisk/full for startup and error messages.

    Try some CLI reserach --
     
  10. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Thanks, that's good advice. /var/log/asterisk/full looks very promising. Wish I hadn't rebooted my pbx, but will definitely be looking in there next time it happens. I also found this page at voip-info which seems to have lots of juicy debugging info.
     
  11. phonebuff

    phonebuff Guru

    Joined:
    Feb 7, 2008
    Messages:
    874
    Likes Received:
    57
    Logrotate might be archiving for you..

    But a reboot does not clear this file...
     
  12. Stewart

    Stewart Guru

    Joined:
    Sep 16, 2009
    Messages:
    604
    Likes Received:
    6
    Absolutely. Try using grep and looking for a particular timestamp in the /var/log/asterisk/ directory. It should point you to the right file and then you can search in that file. It may be full, full.1, etc.
     
  13. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    This is weird- I checked in the /var/log/asterisk/full log and there's nothing that really indicates any severe problem. For example, last night around 2am I had * start hanging on me again, so I looked and here's a snippet of what I saw:

    Code:
    [FONT=Fixedsys][2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/users.conf':
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] ERROR[20239] netsock2.c: getaddrinfo("pbx.local", "(null)", ...): Name or service not known
    [2010-12-31 02:22:30] WARNING[20239] acl.c: Unable to lookup 'pbx.local'
    [2010-12-31 02:22:30] VERBOSE[20239] chan_sip.c:   == SIP Listening on 0.0.0.0:5060
    [2010-12-31 02:22:30] VERBOSE[20239] netsock2.c:   == Using SIP TOS bits 96
    [2010-12-31 02:22:30] VERBOSE[20239] netsock2.c:   == Using SIP CoS mark 4
    [2010-12-31 02:22:30] NOTICE[20239] chan_sip.c: The 'username' field for sip peers has been deprecated in favor of the term 'defaultuser'
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/sip_notify.conf': 
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/sip_notify_custom.conf': 
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/sip_notify_additional.conf': 
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] VERBOSE[20239] channel.c:   == Registered channel type 'SIP' (Session Initiation Protocol (SIP))
    [2010-12-31 02:22:30] VERBOSE[20239] rtp_engine.c:   == Registered RTP glue 'SIP'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered application 'SIPDtmfMode'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered application 'SIPAddHeader'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered application 'SIPRemoveHeader'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered custom function 'SIP_HEADER'
    [2010-12-31 02:22:30] DEBUG[20239] xmldoc.c: Cannot find variable 'SIPPEER' in tree 'description'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered custom function 'SIPPEER'
    [2010-12-31 02:22:30] DEBUG[20239] xmldoc.c: Cannot find variable 'SIPCHANINFO' in tree 'description'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered custom function 'SIPCHANINFO'
    [2010-12-31 02:22:30] VERBOSE[20239] pbx.c:   == Registered custom function 'CHECKSIPDOMAIN'
    [2010-12-31 02:22:30] VERBOSE[20239] manager.c:   == Manager registered action SIPpeers
    [2010-12-31 02:22:30] VERBOSE[20239] manager.c:   == Manager registered action SIPshowpeer
    [2010-12-31 02:22:30] VERBOSE[20239] manager.c:   == Manager registered action SIPqualifypeer
    [2010-12-31 02:22:30] VERBOSE[20239] manager.c:   == Manager registered action SIPshowregistry
    [2010-12-31 02:22:30] VERBOSE[20239] manager.c:   == Manager registered action SIPnotify
    [2010-12-31 02:22:30] VERBOSE[20239] loader.c:  chan_sip.so => (Session Initiation Protocol (SIP))
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/gtalk.conf': 
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] WARNING[20239] config.c: Unknown directive '#bindaddr=192.168.0.10' at line 5 of /etc/asterisk/gtalk.conf
    [2010-12-31 02:22:30] WARNING[20239] config.c: Unknown directive '#externip=122.110.124.1' at line 6 of /etc/asterisk/gtalk.conf
    [2010-12-31 02:22:30] VERBOSE[20239] rtp_engine.c:   == Registered RTP glue 'Gtalk'
    [2010-12-31 02:22:30] VERBOSE[20239] channel.c:   == Registered channel type 'Gtalk' (Gtalk Channel Driver)
    [2010-12-31 02:22:30] VERBOSE[20239] loader.c:  chan_gtalk.so => (Gtalk Channel Driver)
    [2010-12-31 02:22:30] NOTICE[20239] chan_skinny.c: Configuring skinny from skinny.conf
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Parsing '/etc/asterisk/skinny.conf': 
    [2010-12-31 02:22:30] VERBOSE[20239] config.c:   == Found
    [2010-12-31 02:22:30] NOTICE[20277] chan_sip.c: Peer '703' is now Reachable. (50ms / 2000ms)
    [2010-12-31 02:22:30] WARNING[20239] chan_skinny.c: Unable to get our IP address, Skinny disabled[/FONT]
    
    The CLI was still "up" (asterisk -r works & I can still issue commands such as sip show channels etc). So * was in some state of confusion.

    One clue as to what might be causing this is that the tab-completion (auto complete) causes the CLI to become "dead" as well. Example, I type "sip show channel " and then press TAB and at that point, where * would normally present a list of active SIP channels, insteead the CLI goes dead, I can no longer type anything, can't even CTRL+C. My SSH session is still up because I am running screen and if I switch to one of my other screens everything is still working normally. :confused5:

    Another anomaly is that my logs are FULL of the following (repeating every 5 min):
    Code:
    [FONT=Fixedsys][2010-12-30 23:10:17] NOTICE[3192] chan_iax2.c: Peer 'iax-fax3' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:10:17] NOTICE[3200] chan_iax2.c: Peer 'iax-fax1' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:15:12] NOTICE[3194] chan_iax2.c: Peer 'iax-fax0' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:15:12] NOTICE[3196] chan_iax2.c: Peer 'iax-fax2' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:15:12] NOTICE[3201] chan_iax2.c: Peer 'iax-fax3' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:15:12] NOTICE[3193] chan_iax2.c: Peer 'iax-fax1' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:20:07] NOTICE[3193] chan_iax2.c: Peer 'iax-fax0' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:20:07] NOTICE[3194] chan_iax2.c: Peer 'iax-fax2' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:20:07] NOTICE[3192] chan_iax2.c: Peer 'iax-fax3' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:20:07] NOTICE[3201] chan_iax2.c: Peer 'iax-fax1' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:25:02] NOTICE[3201] chan_iax2.c: Peer 'iax-fax0' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:25:02] NOTICE[3198] chan_iax2.c: Peer 'iax-fax2' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:25:02] NOTICE[3197] chan_iax2.c: Peer 'iax-fax3' is not dynamic (from 127.0.0.1)
    [2010-12-30 23:25:02] NOTICE[3194] chan_iax2.c: Peer 'iax-fax1' is not dynamic (from 127.0.0.1)[/FONT]
    This goes on ad-infinitum. This is related to the Hylafax script (a-fax.sh) which sets up these iaxmodem extensions but I'm not sure what the error indicates and whether to just ignore it or if there's a way to fix it. Google produced no results on that.

    Also, picking through the asterisk logs I noticed some module load errors, not sure if these are significant either:

    Code:
    [FONT=Fixedsys][2010-12-31 02:22:30] WARNING[20239] loader.c: Error loading module 'format_mp3.so': /usr/lib/asterisk/modules/format_mp3.so: cannot open shared object file: No such file or directory
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Module 'format_mp3.so' could not be loaded.
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Error loading module 'res_fax_spandsp.so': /usr/lib/asterisk/modules/res_fax_spandsp.so: undefined symbol: t30_set_tx_page_header_info
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Module 'res_fax_spandsp.so' could not be loaded.
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Error loading module 'res_pktccops': /usr/lib/asterisk/modules/res_pktccops.so: cannot open shared object file: No such file or directory
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Error loading module 'chan_mgcp.so': /usr/lib/asterisk/modules/chan_mgcp.so: undefined symbol: ast_pktccops_gate_alloc
    [2010-12-31 02:22:30] WARNING[20239] loader.c: Module 'chan_mgcp.so' could not be loaded.
    [/FONT]
     
  14. rossiv

    rossiv Guru

    Joined:
    Oct 26, 2008
    Messages:
    2,626
    Likes Received:
    138
    Count me in on this one too. I have the Tab-Dead problem, as well as the stop problem. 1.8.1.1. Will check my logs and see if anything shows up strange.
     
  15. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Happy new year! Well, glad I'm not the only one w/ this problem. I'm considering compiling the release-candidate of Asterisk 1.8.2-rc1. Has anyone done this?
     
  16. blanchae

    blanchae Guru

    Joined:
    Mar 12, 2008
    Messages:
    1,910
    Likes Received:
    9
    Check that pbx.local is in your /etc/hosts file and points to 127.0.0.1. Asterisk hates DNS problems.
     
  17. blanchae

    blanchae Guru

    Joined:
    Mar 12, 2008
    Messages:
    1,910
    Likes Received:
    9
    The # sign is not used for comment statements. The correct character is the ";" (semi-colon). The # sign is a directive to the "compiler".
     
  18. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Blanchae,
    thank you for your help. I did indeed have a problem with the hostname on this system. The hostname was set to pbx.local but this wasn't in my /etc/hosts file (yikes). That very well might have been causing major problems. I've got that set up correctly now. Nice catch. Whether that was the cause of these hangs, only time will tell.

    As for the # chars in my gtalk.conf, those are there from Ward's default install (those aren't my IPs)-- I never touched that file and I'm not using gtalk.
     
  19. luckman212

    luckman212 Guru

    Joined:
    Jul 7, 2010
    Messages:
    272
    Likes Received:
    0
    Just a (possibly premature) update on this. I've made two key changes to the pbx since my last post. One was correcting the HOSTNAME as suggested by Blanchae. The other was recompiling Asterisk using the 1.8.2-rc1 source from Digium. So far the box has been running for about 2 days (and survived many amportal stops & starts) as well as several full shutdown & reboot cycles without a core dump or a zombie. So fwiw I am almost ready to declare victory. Still a bit too early but, so far so good! ;)
     
  20. wardmundy

    wardmundy Nerd Uno

    Joined:
    Oct 12, 2007
    Messages:
    14,782
    Likes Received:
    2,526
    Permission granted to remove one arrow...

    [​IMG]