The name service cache daemon has some serious flaws. One that I notice pretty often in Linux arises when you are using LDAP. It seems that nscd will crash and burn when/if ldap is unavailable. Unfortunately when nscd bombs, it usually takes the entire system with it. Actually what will happen is nscd sockets will start getting broken pipe’s and becoming stale until you have several nscd. Eventually your system will slow to a halt. Mind you the box hasn’t crashed, but the box is in a dos state. If you do an ‘$# netstat -an‘ you’ll notice several entries in reference to ‘/var/run/nscd/socket’. This bug has been listed by several distros, I’ve yet to see any of them address it properly.
From what I’ve discovered about nscd this issue is a design flaw in nscd. Several people have just recommended disabling nscd. While this is a solution, it isn’t an acceptable solution in an enterprise environment. Especially if you are using oracle. Several oracle tools and apps will freeze or crash if they can’t talk to the nscd socket. Lame I know, but I’ve observed this in at least Oracle 10i.
Now there are tow ways to get around this. One is to make your LDAP service load balanced and highly available. You can do replication in OpenLDAP with slurpd and then use something like balance to make the service highly available. Another option though is to fix nscd so it doesn’t bomb your machine when it dies.
In this article we are going to go over the fix nscd solution. nscd comes with a config file located in ‘/etc/nscd.conf’ This file tells the nscd daemon how long to cache which name service. See the man page to understand what each option means. The quick way to fix nscd is to tell the service here not to cache passwd and group info. This will make nscd not go to LDAP and freak out if LDAP is not available. Randomly I also came across a drop in replacement for nscd that dosen’t suffer from this issue. unscd-033.c The compile instructions are in the file. I can’t guarantee this safe, but I’ve used it for a while in an enterprise environment and seem haven’t had any nscd issues since