home | tech | misc | code | bookmarks (broken) | contact | README

# NFS Trouble Shooting

## mount returns error in NetBSD

When trying to mount a NFS directory, if you get this error:

mount_nfs: rpcbind on server: RPC: Program not registered


Try to restart /etc/rc.d/nfsd on server. If you still get an error like this:

Cannot MNT RPC (mountd): RPC: Program not registered


Then try to restart /etc/rc.d/mountd on server

## Making root write in NFS mounted directory in NetBSD

To allow root write in partitions mounted by NFS, in /etc/exports insert the -maproot=root option on the line of the directory you want to give write access to. See this message for a reference.

Then make mountd reload the configuration:

/etc/rc.d/mountd reload


This problem can have several reasons, like firewall configuration, wrong path specification, etc. In my case it was a stupid thing:

The directory I was mounting was /home/user, but the one in the /etc/exports file on the server was /export/home/user. The former was just a symbolic link to the further. In the mount command I used the real directory (not the symbolic link) and it worked.

## NFS too slow

First, if your NFS is too slow you should take a look at this page, which has a lot of nice information on probable things that can happen to make your NFS connection slow, like wrong firewall configuration or DNS problems.

Here are some possible stuff that can happen to you:

In one case, we have some hosts connected by InfiniBand and Ethernet. The preferred channel is InfiniBand. But after taking a look at the logs, we realized the following lines:

Jun 26 13:07:07 r1n7 kernel: [98588.996018] nfs: server estudante-ib0 not responding, still trying
Jun 26 13:07:13 r1n7 kernel: [98594.386820] nfs: server estudante-ib0 OK


There were serious problems with the InfiniBand connections. Since we couldn't stop the cluster to investigate, we chose to disable InfiniBand connections and work on Ethernet only, until we discover the true reasons.

NFS problems appear first as network problems, but they can have another origin, like being a HDD problem in the server. Unfortunatelly it is not easy to see. If this happens, lot of different things can happen in the server, like nfsd going to D state and never coming back. Check it out.

## nfsd going to D state and never coming back

After having problems with NFS performance (see NFS too slow) I decided to investigate further. Soon I saw that nfsd daemon in the server was stuck in D state when seeing its details with ps(1).

After many hypothesis, like it being a kernel sync() bug, I took a lot of decisions, like updating the kernel, increasing the number of nfsd instances and so on. None of them worked, so I decided to ask in the linux-nfs mail list.

In my question I put some stuff about NFS, as well the call trace of the kernel calls, regarding NFS (and other processes that stuck, like sync). See that all functions it hangs are related to I/O scheduling. The answer comes at a good time. The conclusion? My disk is dying.

This was a very nice example on how a diagnostic of a NFS problem can be first seen as a network problem, but can have very different causes.

## Forcing umount of a NFS volume in GNU/Linux

In GNU/Linux you can use the -f option of the umount(8) command but it may not work. There is also a -l option that umounts the volume immediately and "cleanup all references to the filesystem as soon as it is not busy anymore".

## NFS mapping users to nobody/nogroup

In CentOS I had a problem because owner information of files were getting mapped to nobody/nogroup and I didn't knew why. After some investigation, in which I found this link, I discovered that the hostname of both the server and client had the domain part different:

server.foo.com.br
client.foo.int.br


I just change client's hostname and it worked fine.

## NFS messing ownership to 4294967294

While changing NFS files, if you see that, although you change ownership, they don't change or later they get the strange ownership nobody/nobody or the number 4294967294, this is very likely to be wrong NFS version between client and server. It seems to happen in heterogeneous environments.

In my case I had a CentOS 6 server and a Debian Squeeze as clients. According to this page, in Debian machines, I had to specify nfsvers=3 in /etc/fstab for NFS options when mounting, because NFS 4 (default) was not working.