SPrabhu's Blog

Thursday, August 20, 2015

Investigation into the effects of server timeouts on the CIFS client.

The investigation is driven by a user request to reduce the time spent by the client waiting for a request to an unresponsive server.

The client awaits responses by the server in the kernel thread "cifsd" which runs the call path
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->kernel_recvmsg()

We investigate 2 scenarios
a) The server is orderly shutdown ie. the server closes the socket.
b) The server is abruptly taken off the network ie. it has no chance to orderly shutdown the socket. This is simulated using iptables.

In both cases we assume that the client has a share mounted from the server.

Server is orderly shutdown

kernel_recvmsg() call in cifs_readv_from_socket() returns 0.

int

cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig,

                       unsigned int nr_segs, unsigned int to_read)

{

..

                length = kernel_recvmsg(server-&gt;ssocket, &amp;smb_msg,

                                        iov, segs, to_read, 0);

..

                } else if (length &lt;= 0) {

                        cifs_dbg(FYI, "Received no data or error: expecting %d\n"

                                 "got %d", to_read, length);

                        cifs_reconnect(server);

                        total_read = -ECONNABORTED;

                        break;

                }

..

}

A call is made to cifs_reconnect() where an attempt is made to continuously reconnect to the server in the "cifsd" thread every in the code path
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->cifs_reconnect()->generic_ip_connect()

int

cifs_reconnect(struct TCP_Server_Info *server)

{

..

        do {

..

                rc = generic_ip_connect(server);

                if (rc) {

                        cifs_dbg(FYI, "reconnect error %d\n", rc);

                        mutex_unlock(&amp;server-&gt;srv_mutex);

                        msleep(3000);

                }

..

        } while (server-&gt;tcpStatus == CifsNeedReconnect);

        return rc;

}

While the server is down, generic_ip_connect() continously returns -EHOSTDOWN(111). It will sleep for 3 seconds between each reconnect attempt.

While the attempts to reconnect are made in the cifsd thread, any attempt to perform an operation on the share results in the thread calling smb_init() which calls cifs_reconnect_tcon()
ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()

static int

cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)

{

..

        while (server-&gt;tcpStatus == CifsNeedReconnect) {

                wait_event_interruptible_timeout(server-&gt;response_q,

                        (server-&gt;tcpStatus != CifsNeedReconnect), 10 * HZ);

..

                /*

                 * on "soft" mounts we wait once. Hard mounts keep

                 * retrying until process is killed or server comes

                 * back on-line

                 */

                if (!tcon-&gt;retry) {

                        cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n");

                        return -EHOSTDOWN;

                }

        }

..

}

In the while loop above, we return after 10 seconds in case of a soft mount or continously retry in case of hardmounts.

Conclusion:

In cases where the server is shutdown orderly, the client immediately attempts to reconnect. It continues doing this through the cifsd thread keeping an interval of 3 seconds between each attempt. Any attempt to perform an operation on this mount on a soft mounted share(which is the default) will fail after waiting for 10 seconds for the reconnect attempts made by the cifsd thread to succeed.

Server is abruptly shutdown

The server tcp connection is represented by struct TCP_Server_Info. This contains a variable "unsigned long lstrp" which contains the jiffies at which the client last got a response from the server.

struct TCP_Server_Info {

..

        unsigned long lstrp; /* when we got last response from this server */

..

};

This is initialised in cifs_get_tcp_session when the first connection to the server is made.

static struct TCP_Server_Info *
cifs_get_tcp_session(struct smb_vol *volume_info)

{

..

        tcp_ses-&gt;lstrp = jiffies;

..

}

This value is set to the latest jiffies at the end of every successful read request.

static int
cifs_demultiplex_thread(void *p)
{
..
        while (server->tcpStatus != CifsExiting) {
..
                mid_entry = server->ops->find_mid(server, buf);
                if (!mid_entry || !mid_entry->receive)
                        length = standard_receive3(server, mid_entry);
                else
                        length = mid_entry->receive(server, mid_entry);
                if (length < 0)
                        continue;
                if (server->large_buf)
                        buf = server->bigbuf;
                server->lstrp = jiffies;
..
        } /* end while !EXITING */
..
}

It is this last response variable server->lstrep which is used to determine the responsiveness of the server.

Before we look at how it is used to determine the server responsiveness, we should also take a look at the echo service. This is the keepalive service used to "ping" the server at regular intervals in case of inactivity. This is done by setting up a work on the cifsiod_wq workqueue which is called ever SMB_ECHO_INTERVAL(60) seconds.

static struct TCP_Server_Info *
cifs_get_tcp_session(struct smb_vol *volume_info)
{
..
        INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
..
        /* queue echo request delayed work */
        queue_delayed_work(cifsiod_wq, &tcp_ses->echo, SMB_ECHO_INTERVAL);
..
}

After SMB_ECHO_INTERVAL(60) seconds, we call

static void
cifs_echo_request(struct work_struct *work)
{
        int rc;
        struct TCP_Server_Info *server = container_of(work,
                                        struct TCP_Server_Info, echo.work);
        /*
         * We cannot send an echo if it is disabled or until the
         * NEGOTIATE_PROTOCOL request is done, which is indicated by
         * server->ops->need_neg() == true. Also, no need to ping if
         * we got a response recently.
         */
        if (!server->ops->need_neg || server->ops->need_neg(server) ||
            (server->ops->can_echo && !server->ops->can_echo(server)) ||
            time_before(jiffies, server->lstrp + SMB_ECHO_INTERVAL - HZ))
                goto requeue_echo;
        rc = server->ops->echo ? server->ops->echo(server) : -ENOSYS;
        if (rc)
                cifs_dbg(FYI, "Unable to send echo request to server: %s\n",
                         server->hostname);
requeue_echo:
        queue_delayed_work(cifsiod_wq, &server->echo, SMB_ECHO_INTERVAL);
}

We call an echo if the session hasn't been negotiated yet and if no response was received since the last echo interval. At the end of the call, we requeue the echo work request on the cifsiod_wq.

Before each call to read on the socket, we first check to make sure that the server is not unresponsive.

int
cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig,
                       unsigned int nr_segs, unsigned int to_read)
{
..
                if (server_unresponsive(server)) {
                        total_read = -ECONNABORTED;
                        break;
                }
..
                length = kernel_recvmsg(server->ssocket, &smb_msg,
                                        iov, segs, to_read, 0);
..
}

Note that we send an echo call every SMB_ECHO_INTERVAL. So in case of an inactive connections, we should receive a reply atleast after every SMB_ECHO_INTERVAL(60) seconds. The timeout for receives on the socket is set to 7 seconds when setting up the socket in generic_ip_connect(). If no replies have been received for 7 seconds, kernel_recvmsg() timesout and we redo the for loop which checks if the server is unresponsive again.

static bool
server_unresponsive(struct TCP_Server_Info *server)
{
        /*
         * We need to wait 2 echo intervals to make sure we handle such
         * situations right:
         * 1s  client sends a normal SMB request
         * 2s  client gets a response
         * 30s echo workqueue job pops, and decides we got a response recently
         *     and don't need to send another
         * ...
         * 65s kernel_recvmsg times out, and we see that we haven't gotten
         *     a response in >60s.
         */
        if (server->tcpStatus == CifsGood &&
            time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) {
                cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n",
                         server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ);
                cifs_reconnect(server);
                wake_up(&server->response_q);
                return true;
        }
        return false;
}

Any filesystem activity performing an operation on this share uses functions such as SendReceive(), SendReceive2() or SendReceiveBlockingLock() to send over commands to the remote server. They then wait for a response from the server by calling wait_for_response().

static int
wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
{
        int error;
        error = wait_event_freezekillable_unsafe(server->response_q,
                                    midQ->mid_state != MID_REQUEST_SUBMITTED);
        if (error < 0)
                return -ERESTARTSYS;
        return 0;
}

It continues waiting for a response until we finally timeout in the cifs_demultiplex_thread "cifsd" in the call path cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->server_unresponsive(). Once the function server_unresponsive() confirms that the server hasn't replied for atleast 2 * SMB_ECHO_INTERVAL, it calls cifs_reconnect() to mark the TCP_Server_Info as needing a reconnect.

int
cifs_reconnect(struct TCP_Server_Info *server)
{
..
  //Mark the TCP_Server_Info as needing reconnect
                server->tcpStatus = CifsNeedReconnect;
..
 //Mark all sessions and tcons as needing a reconnect
        list_for_each(tmp, &server->smb_ses_list) {
                ses = list_entry(tmp, struct cifs_ses, smb_ses_list);
                ses->need_reconnect = true;
                ses->ipc_tid = 0;
                list_for_each(tmp2, &ses->tcon_list) {
                        tcon = list_entry(tmp2, struct cifs_tcon, tcon_list);
                        tcon->need_reconnect = true;
                }
        }
..
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
  //Mark each pending request as requiring a retry
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
        }
        spin_unlock(&GlobalMid_Lock);
..
}

We then wake up all pending requests in server_unresponsive()

static bool
server_unresponsive(struct TCP_Server_Info *server)
{
..
        if (server->tcpStatus == CifsGood &&
            time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) {
                cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n",
                         server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ);
  //Mark all pending requests as needing retries
                cifs_reconnect(server);
  //Wake up all tasks waiting for a response.
                wake_up(&server->response_q);
                return true;
        }
        return false;
}

For the task sending the request, we return and return -EAGAIN requesting the task to be run again.

int
SendReceive(const unsigned int xid, struct cifs_ses *ses,
            struct smb_hdr *in_buf, struct smb_hdr *out_buf,
            int *pbytes_returned, const int timeout)
{
..
 //Set request status as submitted
        midQ->mid_state = MID_REQUEST_SUBMITTED;
        cifs_in_send_inc(ses->server);
 //Send request
        rc = smb_send(ses->server, in_buf, be32_to_cpu(in_buf->smb_buf_length));
        cifs_in_send_dec(ses->server);
        cifs_save_when_sent(midQ);
..
 //Wait for response
        rc = wait_for_response(ses->server, midQ);
..
 //Check response. For a retry, this call will return -EAGAIN
        rc = cifs_sync_mid_result(midQ, ses->server);
        if (rc != 0) {
                add_credits(ses->server, 1, 0);
  //Which is returned to the called
                return rc;
        }
..
}

The call attempts to retry and subsequently attempts to reconnect to the tcon

ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()

static int
cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
{
..
        while (server->tcpStatus == CifsNeedReconnect) {
                wait_event_interruptible_timeout(server->response_q,
                        (server->tcpStatus != CifsNeedReconnect), 10 * HZ);
..
                /*
                 * on "soft" mounts we wait once. Hard mounts keep
                 * retrying until process is killed or server comes
                 * back on-line
                 */
                if (!tcon->retry) {
                        cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n");
                        return -EHOSTDOWN;
                }
        }
..
}

In the while loop above, we return after 10 seconds in case of a soft mount or continuously retry in case of hardmounts.

Conclusion:

When the server is abruptdly taken off the network, we will mark the server as requiring a reconnect. This can take at max a total of

timeout for recvmsg(7) + 2 * SMB_ECHO_INTERVAL(60) + time waiting for reconnect in cifs_reconnect_tcon() (10) = 137 seconds.

Sunday, June 28, 2015

10 minute Rava Chakuli

This recipe is from Shyamala Kini from Konkani Amchi food.

Boil 2 cups of water.
Once it comes to rolling boil ,add salt to taste, little jeera, a pinch of hing, pinch of chilly pwd and 1 cup of rava/sooji.
Switch off the cooker and mix well into a consistency of a dough (something like upma/kesari consistency.)
Cool it till lukewarm.
Then add 1 cup of rice flour and a spoon of butter and some sesame seeds.Knead/mix well.
Put it into moulds and deep fry them on medium flame.
Allow it to cool before tasting

P.S.:Use the same cup for the measurement.

Monday, December 15, 2014

Debugging calls to cifs.upcall

/usr/sbin/cifs.upcall is the request-key help program used to obtain certain data like kerberos keys or results of dns calls from userland processes.

It is sometimes necessary to debug the execution of this helper function for which I use the following methods.

1) Capture debug messages from cifs.upcall.

Edit /etc/rsyslog.conf and add the line

*.* /var/log/debug

Restart rsyslog service and confirm that the log file /var/log/debug has been created which will log all messages.

Attempt the call which will usually be the call to mount the cifs share. You should see the debug messages from cifs.upcall in /var/log/debug.

2) Strace cifs.upcall calls.

First move the original cifs.upcall file
# cd /usr/sbin; mv cifs.upcall cifs.upcall.orig

Create a new text file cifs.upcall with the following content.

#!/bin/bash

echo $@ >> /tmp/upcall.log
strace -fxvto /tmp/cifs.upcall.st -s1000 /usr/sbin/cifs.upcall.orig $@

Make sure this is an executable
# chmod +x cifs.upcall

Now attempt the mount process. You will see the strace output in /tmp/cifs.upcall.st.

Tuesday, June 17, 2014

Using cifs.idmap

We first setup winbind:

Make sure you have the following packages installed:

samba-winbind: Provides the winbindd daemon required.

samba-winbind-clients: Provides the libnss and pam modules required by winbind

Add winbind configuration in /etc/samba/smb.conf:

[global]
security = ads
realm = ENG1.LAB.EXAMPLE.COM
workgroup = ENG1

winbind separator = +
winbind cache time = 120
winbind enum users = yes
winbind enum groups = yes
winbind use default domain = yes

idmap backend = rid
idmap uid = 10000-20000
idmap gid = 10000-20000

template homedir = /home/%D/%U
template shell = /bin/bash
password server = vm140-52.eng1.lab.example.com

Configure /etc/krb5.conf

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
default_realm = ENG1.LAB.EXAMPLE.COM
dns_lookup_realm = true
dns_lookup_kdc = true
allow_weak_crypto = 1

[realms]
ENG1.LAB.EXAMPLE.COM = {
kdc = vm140-52.eng1.lab.example.com:88
}

[domain_realm]
.eng1.lab.example.com = ENG1.LAB.EXAMPLE.COM
eng1.lab.example.com = ENG1.LAB.EXAMPLE.COM

Edit /etc/nssswitch.conf and add winbind:

Make sure that the the password and group lookups use winbind

..
passwd: files winbind
shadow: files
group: files winbind
..

You now need to join the ADS:

# net ads join -D 5 -U Administrator@ENG1.LAB.EXAMPLE.COM -S vm140-52
Enable winbindd:
# systemctl enable winbindd
# systemctl start winbindd
To test: (wintest1 is a user on the ADS)
# id wintest1

At this point, winbind is setup for your machine.

To debug winbind, stop the winbind service and run winbind in the following manner on the command line.

#winbindd -F -d 3 -S |tee winbind.out

This prints debug output onto the terminal as well as the file winbind.out in the cwd.

Mounting the cifs share:

First ensure that the request-key mechanism is setup to use cifs.idmap.

Ensure that the following line exists in either /etc/request-key.conf or /etc/request-key.d/cifs.idmap.conf(default location for Fedora and RHEL)

create cifs.idmap * * /usr/sbin/cifs.idmap %k

Now mount the cifs share using the mount option cifsacl:

# mount -t cifs -o username=wintest1,password=pass1,cifsacl //192.168.140.53/exports /mnt;

You should now be able to see the usernames from the ADS

# ls -l /mnt
total 4
drwxr-xr-x 1 root domain users 4096 May 15 16:37 scratch
drwxr-xr-x 1 wintest1 domain users 0 May 13 12:20 wintest1
drwxr-xr-x 1 wintest2 domain users 0 Mar 19 13:37 wintest2

Note that the cifsacl results in an additional NT Transact call - QUERY_SECURITY_DESC to retrive the security descriptor for each file which has been stat-ed. This results in a performance penalty.

More information is available in the cifs.idmap man-page.

Friday, January 03, 2014

Systemd: Quick-start guide

Systemd: Quick walkthrough

1) List all units controlled by systemd
# systemctl
or
#systemctl list-units

2) List all _active_ services on the system
# systemctl list-units -t service

3) list all service on the system
# systemctl list-units -t service --all

4) Check service status
# systemctl status sshd.service

5) Start Service
# systemctl start sshd.service

6) Stop Service
# systemctl stop sshd.service

7) Enable Service
# systemctl enable sshd.service

8) Disable Service
# systemctl disable sshd.service

9) You can view service dependencies with the command
# systemctl list-dependencies

10) Systemd is also used to halt/reboot/hibernate/shutdown
check systemctl --help for commands which are available.

Systemd uses cgroups extensively and groups processes started by a particular service into its own groups. This means that killing a service will get all processes started by a particular service.

References:
http://www.linux.com/learn/tutorials/527639-managing-services-on-linux-with-systemd

Sunday, September 08, 2013

Grub2: Setting default kernel to boot

Now that most distros have moved to grub2, it is important to learn the recommended way to set the default boot option.

Grub2 dynamically generates the configuration file /boot/grub2/grub.cfg. This is done by calling the grub2-mkconfig command which call the scripts in /etc/grub.d/* in sequence and prints the resulting grub2 configuration file onto STDOUT.

The default kernel to boot is set based on the value for directive "GRUB_DEFAULT" in /etc/default/grub. You can set the default to a number as in the grub1 configuration. However it is recommended you set this to 'saved' so that you can use the grub2-set-default command. You use the grub2-set-default command to set the label of the kernel you would like to boot. This is a better approach when compared to using the index number of the kernel listed in the grub configuration file since this avoids problems caused by change in order of the listed kernels.

To set the default kernel,
1) Check to see that the directive "GRUB_DEFAULT" in /etc/default/grub is set to 'saved'.
2) List the kernels present in grub.cfg with the command: grep ^menuentry /boot/grub2/grub.cfg|cut -d "'" -f2
3) Select the kernel label from the list obtained from the previous kernel and then set it as default using the command: grub2-set-default
4) Verify that the correct kernel has been selected with the command: grub2-editenv list

You should then reboot the machine to load the new kernel.

Reference: https://fedoraproject.org/wiki/GRUB_2

Friday, May 24, 2013

Chicken Curry

Another method to cook a smooth textured chicken curry. I make good use of the grinder attachment on my blender.

Contents:

Chicken thigh pieces - Skinless, with each thigh cut into 3 parts.
2 large onions - Chopped.
3 green finger chillies
150 gms full fat yoghurt
1 tbsp coriander powder
1/2 tbsp cumin powder
1 tsp tumeric powder
1/2 tsp red chilly powder
4-5 cashews
5-7 cloves
approximately 5 pepper corns
1 inch of cinnamon
3 star anise
1/2 tsp garam masala
1 tbsp kasauri Methi
2 tbsp oil
Fresh coriander for garnishing
Salt to taste

Preparation:

In a saucepan gently heat the whole spices ie. cloves, pepper corns, star anise, cinnamon stick until your kitchen smells lovely. Put these spices into the grinder and grind them to a fine powder.
In the same saucepan, add oil and the chopped onions and fry for about 4 minutes. Add the tumeric and continue frying until they are transparent.
Add the coriander, cumin powder and red chilli and fry for 30 secs. Switch off the heat.
Once the onion has cooled down, add it to the same grinder which contains the ground spices. Grind them together coarsely. You don't want this fine as the sauce needs some texture.
In the saucepan used earlier, add some more oil and throw in the chicken.
Add the ground sauce over the chicken followed by the salt. Stir the sauce into the chicken, add a cup of water and let it cook for 7 minutes.
While the chicken cooks, in the same grinder, add 150 gms of yoghurt and the cashew nuts. Grind these until the cashew nuts are completely ground into the yoghurt.
Add the yoghurt cashew mixture to the sauce. This is the final use for the grinder. Also add the Kasauri methi at this stage
Continue cooking for about 7 more minutes until the chicken is cooked.
Garnish with fresh coriander and serve.

SPrabhu's Blog

Thursday, August 20, 2015

Investigation into the effects of server timeouts on the CIFS client.

Server is orderly shutdown

Conclusion:

Server is abruptly shutdown

Conclusion:

Sunday, June 28, 2015

10 minute Rava Chakuli

Monday, December 15, 2014

Debugging calls to cifs.upcall

Tuesday, June 17, 2014

Using cifs.idmap

We first setup winbind:

Mounting the cifs share:

Friday, January 03, 2014

Systemd: Quick-start guide

Sunday, September 08, 2013

Grub2: Setting default kernel to boot

Friday, May 24, 2013

Chicken Curry

Setting a frost alarm on Home assistant

Search This Blog