The client awaits responses by the server in the kernel thread "cifsd" which runs the call path
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->kernel_recvmsg()
We investigate 2 scenarios
a) The server is orderly shutdown ie. the server closes the socket.
b) The server is abruptly taken off the network ie. it has no chance to orderly shutdown the socket. This is simulated using iptables.
In both cases we assume that the client has a share mounted from the server.
Server is orderly shutdown
kernel_recvmsg() call in cifs_readv_from_socket() returns 0.
int cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig, unsigned int nr_segs, unsigned int to_read) { .. length = kernel_recvmsg(server->ssocket, &smb_msg, iov, segs, to_read, 0); .. } else if (length <= 0) { cifs_dbg(FYI, "Received no data or error: expecting %d\n" "got %d", to_read, length); cifs_reconnect(server); total_read = -ECONNABORTED; break; } .. }
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->cifs_reconnect()->generic_ip_connect()
int cifs_reconnect(struct TCP_Server_Info *server) { .. do { .. rc = generic_ip_connect(server); if (rc) { cifs_dbg(FYI, "reconnect error %d\n", rc); mutex_unlock(&server->srv_mutex); msleep(3000); } .. } while (server->tcpStatus == CifsNeedReconnect); return rc; }
While the server is down, generic_ip_connect() continously returns -EHOSTDOWN(111). It will sleep for 3 seconds between each reconnect attempt.
While the attempts to reconnect are made in the cifsd thread, any attempt to perform an operation on the share results in the thread calling smb_init() which calls cifs_reconnect_tcon()
ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()
static int cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command) { .. while (server->tcpStatus == CifsNeedReconnect) { wait_event_interruptible_timeout(server->response_q, (server->tcpStatus != CifsNeedReconnect), 10 * HZ); .. /* * on "soft" mounts we wait once. Hard mounts keep * retrying until process is killed or server comes * back on-line */ if (!tcon->retry) { cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n"); return -EHOSTDOWN; } } .. }
Conclusion:
In cases where the server is shutdown orderly, the client immediately attempts to reconnect. It continues doing this through the cifsd thread keeping an interval of 3 seconds between each attempt. Any attempt to perform an operation on this mount on a soft mounted share(which is the default) will fail after waiting for 10 seconds for the reconnect attempts made by the cifsd thread to succeed.Server is abruptly shutdown
The server tcp connection is represented by struct TCP_Server_Info. This contains a variable "unsigned long lstrp" which contains the jiffies at which the client last got a response from the server.
struct TCP_Server_Info { .. unsigned long lstrp; /* when we got last response from this server */ .. };
This is initialised in cifs_get_tcp_session when the first connection to the server is made.
static struct TCP_Server_Info * cifs_get_tcp_session(struct smb_vol *volume_info) { .. tcp_ses->lstrp = jiffies; .. }
This value is set to the latest jiffies at the end of every successful read request.
static int cifs_demultiplex_thread(void *p) { .. while (server->tcpStatus != CifsExiting) { .. mid_entry = server->ops->find_mid(server, buf); if (!mid_entry || !mid_entry->receive) length = standard_receive3(server, mid_entry); else length = mid_entry->receive(server, mid_entry); if (length < 0) continue; if (server->large_buf) buf = server->bigbuf; server->lstrp = jiffies; .. } /* end while !EXITING */ .. }
It is this last response variable server->lstrep which is used to determine the responsiveness of the server.
Before we look at how it is used to determine the server responsiveness, we should also take a look at the echo service. This is the keepalive service used to "ping" the server at regular intervals in case of inactivity. This is done by setting up a work on the cifsiod_wq workqueue which is called ever SMB_ECHO_INTERVAL(60) seconds.
static struct TCP_Server_Info * cifs_get_tcp_session(struct smb_vol *volume_info) { .. INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request); .. /* queue echo request delayed work */ queue_delayed_work(cifsiod_wq, &tcp_ses->echo, SMB_ECHO_INTERVAL); .. }
After SMB_ECHO_INTERVAL(60) seconds, we call
static void cifs_echo_request(struct work_struct *work) { int rc; struct TCP_Server_Info *server = container_of(work, struct TCP_Server_Info, echo.work); /* * We cannot send an echo if it is disabled or until the * NEGOTIATE_PROTOCOL request is done, which is indicated by * server->ops->need_neg() == true. Also, no need to ping if * we got a response recently. */ if (!server->ops->need_neg || server->ops->need_neg(server) || (server->ops->can_echo && !server->ops->can_echo(server)) || time_before(jiffies, server->lstrp + SMB_ECHO_INTERVAL - HZ)) goto requeue_echo; rc = server->ops->echo ? server->ops->echo(server) : -ENOSYS; if (rc) cifs_dbg(FYI, "Unable to send echo request to server: %s\n", server->hostname); requeue_echo: queue_delayed_work(cifsiod_wq, &server->echo, SMB_ECHO_INTERVAL); }
We call an echo if the session hasn't been negotiated yet and if no response was received since the last echo interval. At the end of the call, we requeue the echo work request on the cifsiod_wq.
Before each call to read on the socket, we first check to make sure that the server is not unresponsive.
int cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig, unsigned int nr_segs, unsigned int to_read) { .. if (server_unresponsive(server)) { total_read = -ECONNABORTED; break; } .. length = kernel_recvmsg(server->ssocket, &smb_msg, iov, segs, to_read, 0); .. }
Note that we send an echo call every SMB_ECHO_INTERVAL. So in case of an inactive connections, we should receive a reply atleast after every SMB_ECHO_INTERVAL(60) seconds. The timeout for receives on the socket is set to 7 seconds when setting up the socket in generic_ip_connect(). If no replies have been received for 7 seconds, kernel_recvmsg() timesout and we redo the for loop which checks if the server is unresponsive again.
static bool server_unresponsive(struct TCP_Server_Info *server) { /* * We need to wait 2 echo intervals to make sure we handle such * situations right: * 1s client sends a normal SMB request * 2s client gets a response * 30s echo workqueue job pops, and decides we got a response recently * and don't need to send another * ... * 65s kernel_recvmsg times out, and we see that we haven't gotten * a response in >60s. */ if (server->tcpStatus == CifsGood && time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) { cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n", server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ); cifs_reconnect(server); wake_up(&server->response_q); return true; } return false; }
Any filesystem activity performing an operation on this share uses functions such as SendReceive(), SendReceive2() or SendReceiveBlockingLock() to send over commands to the remote server. They then wait for a response from the server by calling wait_for_response().
static int wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ) { int error; error = wait_event_freezekillable_unsafe(server->response_q, midQ->mid_state != MID_REQUEST_SUBMITTED); if (error < 0) return -ERESTARTSYS; return 0; }
It continues waiting for a response until we finally timeout in the cifs_demultiplex_thread "cifsd" in the call path cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->server_unresponsive(). Once the function server_unresponsive() confirms that the server hasn't replied for atleast 2 * SMB_ECHO_INTERVAL, it calls cifs_reconnect() to mark the TCP_Server_Info as needing a reconnect.
int cifs_reconnect(struct TCP_Server_Info *server) { .. //Mark the TCP_Server_Info as needing reconnect server->tcpStatus = CifsNeedReconnect; .. //Mark all sessions and tcons as needing a reconnect list_for_each(tmp, &server->smb_ses_list) { ses = list_entry(tmp, struct cifs_ses, smb_ses_list); ses->need_reconnect = true; ses->ipc_tid = 0; list_for_each(tmp2, &ses->tcon_list) { tcon = list_entry(tmp2, struct cifs_tcon, tcon_list); tcon->need_reconnect = true; } } .. spin_lock(&GlobalMid_Lock); list_for_each_safe(tmp, tmp2, &server->pending_mid_q) { mid_entry = list_entry(tmp, struct mid_q_entry, qhead); //Mark each pending request as requiring a retry if (mid_entry->mid_state == MID_REQUEST_SUBMITTED) mid_entry->mid_state = MID_RETRY_NEEDED; list_move(&mid_entry->qhead, &retry_list); } spin_unlock(&GlobalMid_Lock); .. }
We then wake up all pending requests in server_unresponsive()
static bool server_unresponsive(struct TCP_Server_Info *server) { .. if (server->tcpStatus == CifsGood && time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) { cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n", server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ); //Mark all pending requests as needing retries cifs_reconnect(server); //Wake up all tasks waiting for a response. wake_up(&server->response_q); return true; } return false; }
For the task sending the request, we return and return -EAGAIN requesting the task to be run again.
int SendReceive(const unsigned int xid, struct cifs_ses *ses, struct smb_hdr *in_buf, struct smb_hdr *out_buf, int *pbytes_returned, const int timeout) { .. //Set request status as submitted midQ->mid_state = MID_REQUEST_SUBMITTED; cifs_in_send_inc(ses->server); //Send request rc = smb_send(ses->server, in_buf, be32_to_cpu(in_buf->smb_buf_length)); cifs_in_send_dec(ses->server); cifs_save_when_sent(midQ); .. //Wait for response rc = wait_for_response(ses->server, midQ); .. //Check response. For a retry, this call will return -EAGAIN rc = cifs_sync_mid_result(midQ, ses->server); if (rc != 0) { add_credits(ses->server, 1, 0); //Which is returned to the called return rc; } .. }
The call attempts to retry and subsequently attempts to reconnect to the tcon
ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()
static int cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command) { .. while (server->tcpStatus == CifsNeedReconnect) { wait_event_interruptible_timeout(server->response_q, (server->tcpStatus != CifsNeedReconnect), 10 * HZ); .. /* * on "soft" mounts we wait once. Hard mounts keep * retrying until process is killed or server comes * back on-line */ if (!tcon->retry) { cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n"); return -EHOSTDOWN; } } .. }
In the while loop above, we return after 10 seconds in case of a soft mount or continuously retry in case of hardmounts.
Conclusion:
When the server is abruptdly taken off the network, we will mark the server as requiring a reconnect. This can take at max a total of
timeout for recvmsg(7) + 2 * SMB_ECHO_INTERVAL(60) + time waiting for reconnect in cifs_reconnect_tcon() (10) = 137 seconds.