Coding for Mobile NetworkingMost interesting applications interact with resources outside of the application itself. They interact with the operating system, the device's local resources, and networked resources. Each level of increasing interaction brings with it increasing application power but also diminished control. Increased interaction with off-device resources brings with it the increased probability of intermittent communications failures. The most important thing to remember when you are writing code that communicates over a network is that you are no longer in full control of the outcome.The paragraphs below give a framework for thinking about increasing levels of interdependence between your application and the resources surrounding it.Closed system computing When writing an algorithm to process data that the application has full ownership of, you are in full control of your application's destiny. Everything that happens in this system is happening because of code that you can inspect and can get a full and detailed understanding of. If your algorithm allocates and releases memory, it is giving up a slight bit of control to the runtime to manage that memory, but still you can have a very high degree of confidence that you are in control of the system. These situations approximate a closed system over which you have full determinacy of the outcome.Cooperative computing with the operating system When writing code that interacts with the runtime and operating system, you are giving up some more control to the runtime and operating system in exchange for rich services provided by them. Presenting a user interface on a modern computing device is usually an example of this; the user interface is a cooperative effort between your application's code and the operating system. The underlying operating system and runtime operate the user interface for you and send your application events and messages when interesting things occur. In this mode of software development, you are no longer in control of a closed system; you are now in a cooperative system where your application and the runtime cooperate to provide a rich experience for the user. Although you cannot be sure exactly what is going on in the underlying system, you can still make very good assumptions about the behavior of the application as a whole. For example, while your application is giving up low-level details of how its user interface operates, it is a fair assumption that it is still in full charge of everything that goes into the user interface and is not sharing that resource with any other applications.Cooperative computing with other applications running on the device Your mobile device can assume that it has full control of its user interface because the operating system and runtime environment have set up logical boundaries between the individual resources of different applications; what's mine is mine and what's yours is yours. When your mobile application starts working with resources global to the device such as local files and databases, it must be mindful that it is not the only potential client for these resources. The operating system serves as the honest broker of these resources, but it cannot guarantee your application exclusive and full access to any given resource at all times. Additionally, there is the possibility of running out of resources; for instance, if other applications use up all of the available file system space, your application must respond robustly to this unfortunate circumstance. When working with shared resources, it is important to code defensively and to understand that every attempt to access a shared resource can fail. Thought must also be given to what happens when a resource is accessed simultaneously by different applications; some types of resources lock access to a single party at a time, some allow concurrent access but do not make guarantees on the coherency of the data during updates, and some resources guarantee atomic reads and writes of data. Understanding the concurrency behavior of any device-global resource your application uses is important for ensuring its robust behavior.Cooperative computing with the network at large When your application relies on services provided over a network, it is taking on two additional sources of potential unreliability. Access to networked resources may fail because the computer at the other end of the network may be uncooperative. Things may also go wrong because any of the steps in the networking chain that connects the two computers may fail to behave as expected.Adding further complexity is the fact that networked resources may fail in the middle of communications; this is something that happens very rarely in device-local communications. For example, although it is possible that reading from a local file system file may fail, the chances of this happening are vanishingly small; a hard drive may occasionally crash, but this is rare, and when it does your application has bigger problems to worry about than the file it was reading. The chance of failure is so small that many operating systems regularly perform read and write operations to local storage unbeknownst to the applications running on top of the OS. When reading a file over a network, the chances of failure midstream are dramatically higher, higher still if the network is wireless, and even greater yet if the mobile device is roaming between network cells while reading the files. Failure may not occur all the time, but it occurs enough of the time that it must be accounted for and dealt with robustly. Described next are guidelines for building robust networked mobile applications. Do Not Build a Communication-Dependent ApplicationJust as a group of people acting in concert can achieve more than a single person acting alone, a mobile application interacting with the networked world around it can offer a far richer experience than a mobile application that is self-contained. For a group of people in an organization to be effective in meeting goals, the organization must robustly deal with failures in communication and failures in individual people; no single interaction can be critical path or the organization will fail without it. The same is true for mobile devices. A mobile device can greatly benefit from interacting with the world around it, but it should not be reliant on interaction at any given time. This point is both obvious and surprisingly often ignored.Your mobile application should look at each network access as an opportunity and not a necessity. For most mobile devices, networked communication is something that is available "approximately on demand," meaning that a user of a mobile application can usually go to a physical location where they have network access but that the device itself cannot rely on constantly being in a network-connected state. This is very different from a desktop computer that is wired to a network and also significantly different from a laptop computer that may be used while stationary in a network "hot spot." Although few people walk around train stations with their laptops open and typing, many people walk around these same stations tapping into their mobile phones. The difference is a matter of degree, but the distinction is significant.Thinking about network access as intermittent results in two useful design guidelines:Download and cache useful data ahead of time when a network connection is available. It is important to consider what kind of information will be most useful when your application is offline and to make sure it is downloaded and available on the device when needed. While it is useful to manage on-device resources by deferring allocation until needed, off-device information should be downloaded and hoarded to prepare for the application being offline. For example, a connection to an on-device database can be deferred until it is actually needed. In contrast, relevant data in an off-device database should be downloaded ahead of time when a network connection is available. Ideally the downloading of information should be done as a background task that does not block the application's user interface. Having predownloaded information can also significantly speed up an application's performance, especially when the data is commonly accessed by the application or user and would otherwise need to be downloaded on demand.Queue data for upload and allow it to be uploaded when an appropriate network connection becomes available. Users want a responsive experience when working with device-local applications. If the user updates data on a mobile phone application and the application requires an immediate server connection to process that data, the experience will be high latency and unreliable. A far better model is to reliably queue updates on the client and offer either automated or manual batch uploads of the data to servers as available. The data upload code should also be robust enough to defer work until later if a reliable connection cannot presently be established. Never Block Your User Interface Thread for Any Extended Period of TimeCommunication is inherently a synchronous operation; your mobile device application will have a block of data that it will want to move up to a server or down from a server, and moving this data will take some amount of time. Because the amount of time required for transmission to an external server, desktop, or other device is inherently out of your application's direct control, it is important not to perform these kinds of operations on the user interface thread of your application.A mistake that application developers commonly make is to first build all of their communications systems to run synchronously on the user interface thread with the goal of moving this communication to a background thread later in the application development cycle. As a general rule of thumb, this will not produce acceptable results. There are several reasons developers fall into this trap:It is easier to design and debug synchronous communications. This is undoubtedly true. It is much easier to design and debug application logic that runs synchronously. For this reason it is recommended that you do design and debug your communications logic by running it synchronously with your application's user interface logic. The communications routines you write should be synchronous functions. However, as soon as these communications functions are written and their base functionality is tested, they should be made to run asynchronously to the user interface's application logic.It is faster to write synchronous code. This is also true. When milestone deadlines loom, it is very easy to convince yourself that the only way to make the deadline is to cut corners and make the communications logic run synchronous to the user interface.As long as asynchronous operation is kept in mind and designed into the application it will be easy to move operations to be asynchronous later. This is false. Keeping the need to run asynchronously in mind when designing synchronous communications routines is undoubtedly helpful in making these systems asynchronous in the future, but it is simply not enough. The truth is that despite all efforts to the contrary synchronous dependencies will be built in to code that uses synchronous routines. Human beings are simply not good at tracking all of the implicit assumptions that creep into application logic and fixing these kinds of problems is very difficult in later stages of an application's development. The best methodology to use when designing communications code that needs to keep the user interface responsive is to follow the four guidelines below:Design your communications routines as discrete well-encapsulated functions. The best way to prepare your communications routines for running asynchronously is to make sure they are well encapsulated. A communications routine that sends data up to a server should be passed a static copy of that data. It should be the sole owner of this copy; the function should not need to access any global or shared state to get that data. Similarly, a communications routine that reads data from a server should not modify any global application state until it has read in all the data. This principle is important because having two threads work on global application state at the same time is a recipe for complexity, data corruption, and unreliability. The user interface thread should have sole access to the data that it is working with, the communications routines should have another copy, and the two systems should only interact at very cleanly defined and well-tested points.Test your communications routines by calling them synchronously. As noted above, it is much easier to test and debug communications code when it is running synchronously on the application's user interface thread. For this reason, there is little benefit in testing code running on a background thread before the bugs are worked out when running the code synchronously with the user interface.I recommend placing a big button on your form called "test Save Code" and use it to test and debug your communications routines. The reason I recommend using the "big button" approach is because it looks ugly and you will remember to remove it; if the code is hooked up to the user interface in a well-integrated way, there is a strong chance it will be kept that way.Stress test your communications routines by calling them asynchronously in difficult circumstances using a test application. Background threaded code is always more complex than code called synchronously. Moreover, tracking down the kinds of subtle data corruption and state management bugs that can occur when code on different threads interacts in unexpected ways is tricky business made more so by all of the rich application feature code that surrounds the code being debugged. The best way to ensure a robust system is to stress test the code in a simplified environment designed to place the code into difficult states and also have it instrumented to detect abnormal circumstances. It is worth considering building a stress application that attempts to run multiple streams of code asynchronously and carefully inspects the internal state to look for any unexpected results. Doing this kind of testing may point out restrictions you should place on the code being executed to prevent it from getting into potentially risky states; for instance, the code for executing a background task could actively prevent multiple instances of that task running concurrently on different threads if this were identified to be problematic and unnecessary. Communications code that is tested, hardened, and debugged in this way can be integrated into your application with a much higher degree of confidence than code that is optimistically assumed to be correct.Once tested, immediately move the communications routines to run on a background worker thread in your application. After your basic communications code has been debugged synchronously and tested asynchronously, it is ready to be moved into asynchronous operation in your application. Your application should have a clean and consistent model for running code on a background work thread and the same model should be used for all of your asynchronous communications needs. When you integrate communications code into your application's user interface, it should always be integrated using this asynchronous execution model. Do not integrate synchronous communications calls into your user interface with the plan to change them later to run asynchronously when time allows; the longer the code sits there, the more dependencies it will gain.As a rule of thumb, it is far easier to take an asynchronous system and run it synchronously than it is to go the other way. To run an asynchronous system synchronously, your application simply calls the asynchronous code and then goes into a loop that waits for the asynchronous code to finish before moving on. To try to make synchronous communication asynchronous, you will have to identify all of the application state that the communications routines touch and make an isolated copy of that data, you will need to create a background threading model to call code from, you will need to redesign the way the communications code interacts with any user interface elements because often user interface controls cannot be robustly accessed from other threads (this is a common mistake and definitely not robust when using the .NET Compact Framework! <Control>.Invoke() must be used for cross-thread communication to user interface elements), you will need to come up with a way to notify the user interface of communications problems, you will need to deal with error conditions, and you will need to design a model for communicating commands and status between the user interface and communications code. All of this is very difficult to build in after the fact and attempts at taking synchronous operations in mobile applications and making them asynchronous will at the very least require significant redesign and typically will introduce instability and bugs into your application. Design off-device communicating code to be asynchronous from the start and you will be much happier with the results.Chapter 9, "Performance and Multithreading," for a discussion of running code on a background thread. Work at the Highest Level of Abstraction That Is Adaptable to Your NeedsAs with desktop and server code, it is a good idea to work at the highest level of abstraction feasible.For instance, when working with Internet protocols, if you can work at the level of Web services using SOAP requests/responses, it is recommendable to do so; the built-in abstractions will save you a lot of time by making your Web requests simply appear to be method calls. If for performance or customization needs you need to go down a level, HTTP or HTTPS requests and responses offer a fairly high degree of abstraction and also tend to be firewall friendly. Should you find out that for some reason HTTP requests/responses cannot be adapted to serve your needs, you have the option of moving to sockets-level communication and using streams built on top of sockets.Working at the sockets level of abstraction requires taking on a significant burden of complexity because you will have to design your own communications protocols to interact with servers rather than using the simple and well-tested HTTP request/response mechanisms. If your application needs to communicate with a server that requires socket-level communication, it is recommended that you look at building a server-side proxy component that communicates with the socket interface and in turn exposes a HTTP or a Web services interface to your application. Because server-to-server communication is generally more reliable than device-to-server communication, doing this may significantly reduce the complexity of your device-side code and increase the reliability of your mobile application.Only in extreme cases does it make sense to work at a level of communications below the socket level and work with the TCP/IP protocol stack. If your application requires this kind of communication, you will very likely be writing a significant amount of native C language code to work with protocols. The added complexity and required testing this demands is almost never justified by the end application payoff. The higher-level protocols are well tested and your mobile application is taking on an enormous burden to achieve this level of reliability in using its own custom communications protocols. The same also goes for using non-TCP/IP communications; TCP/IP may not be the most ideal communications mechanism for many tasks, but it is very difficult to achieve the level of testing and proven reliability that has gone into these stacks. Unless you are planning on inventing a new commercial protocol and putting the huge amount of design and testing into the effort that this requires, it is a fool's errand to reinvent the wheel to try to get a perfect protocol for your needs. An 80 percent suitable wheel that exists today and has been tested for years is much better than trying to build a new wheel from scratch. Before you go off inventing a new communications protocol or switch to using a lower layer in the communications stack, you should explicitly prove that the existing higher-level communications protocol cannot be used creatively to meet your mobile application's needs. Always Expect FailureThe key principle in writing communications code is dealing robustly with failure. Traditional communications technologies are often described as a multilayered stack of increasing abstractions starting at the physical layer, ranging through link layers, protocol layers, and up to the application layer. Most of these layers work in a similar way on mobile devices. Each layer typically has some built-in robustness facilities for error detection and dealing with small interruptions in communication. In most cases, you will not need to worry about the specifics of the lower layers of communication; this is pretty much the same as when writing desktop or server code. The only difference is that mobile networks are more subject to intermittent failures than fixed-line networks or stationary wireless networks.When writing robust communications code, it is extremely important to keep a careful eye on how resources are cleaned up when something goes wrong. Communication can be a complex endeavor involving the establishment of multiple connections and allocation of different system resources in a chained set of steps. When something goes wrong during communications, it is important to clean up thoroughly, discard any system resources your application is holding, and if you are using the .NET Compact Framework make sure to proactively call Dispose() on any resources that support this method. Calling Dispose() is important because it immediately releases the underlying system resources rather than waiting for the garbage collector to eventually close handles and release locked resources. The C# using keyword (for example, using(myObject) {…your code…}) can also be a great help here because it not only ensures that Dispose() is called under successful circumstances but and also when an exception is thrown inside the using code block. It is important to note that some classes like the .NET System.Net.Sockets.Socket class do not have a public Dispose() method but rather have a Close() method that must be called to release the resources held by the object. Be sure to carefully read the available documentation for whatever communications object you are using to ensure that you fully understand its resource reclamation rules and procedures.If you inadvertently leave resources open when dealing with error conditions, you are inviting a situation where future communication attempts will fail. Without proper cleanup, an intermittent loss of a network connection will very possibly result in a situation where future attempts to establish a connection fail because a needed local resource was left in an exclusive-access open state earlier and cannot be reopened. Future attempts to reestablish communications will fail even though the physical network connection has been restored. This situation is no different when writing desktop or server code except for the fact that intermittent network failures are more common when the communicating device is both mobile and wireless. Error conditions matter more because they occur more.There may also be additional cleanup functions exposed by the communications classes and it may be necessary to call these to ensure a graceful exit. If a communications channel needs to be closed manually, be sure to call its Close() or Dispose() method as appropriate and to wrap that call in error trapping code to deal with the possibility of failure. Under some conditions, an error may occur in your application not due to a communication disruption but due to another source such as when parsing server responses that do not meet your code's expectations. Depending on the application and networked service, it may be important to close down communication in an orderly way. For example, if your application is doing socket-based communication to a custom service that has the concept of logon and logoff, it may be important to try to close communications in an orderly fashion if the application gets into an unexpected state. Rather than just calling Close(), your application may want to send whatever logout command is appropriate to the server and then call Shutdown() on the socket to finish communications before calling Close(). It is important to understand the connection and disconnection patterns of the services your mobile application is using.
Any operation that takes place over a remote connection can fail. Any blocks of code that access off-device resources should be wrapped in exception catching code that deals with this eventuality.Perform your remote communications compactly and close the connection quickly. The longer you have a remote socket, file, database connection or any other networked resource open, the greater the opportunity is for something to go wrong. For this reason, it is important to cleanly encapsulate your network communication code to open a connection, do the necessary work, and close the connection before moving on to other work. It is a bad idea to leave a dangling connection open to a networked resource. Listing 15.1. Trivial File I/O Code That Notes Local vs. Server Differences
Simulate Communications Failures to Test Your Application's RobustnessIt is highly advisable to test all communications error conditions by explicitly triggering intermittent communications failures and seeing how your mobile application responds to these cases. It is equally important to test how the next attempt at network access behaves after your application has attempted to clean up from a previous communications error. Simulate Communications Failures via Client-Side CodeListing 15.2 shows a mechanism to enable you to test your mobile application's robustness in recovering from communications failures. The code snippet contains conditionally compiled code that can be enabled by placing #define DEBUG_SIMULATE_FAILURES at the top of your source file. The function writeDataToSocket() below is called in the normal course of communications. To test the application's response to failure in this communication, the application can set the variable g_failureCode = SimulatedFailures.failInNextWriteSocketCode at any point during the application's execution. When the communications code is subsequently called, it will throw an exception the first time but not subsequent times. This allows testing to simulate the situation where a network connection suddenly drops off and causes a communications failure but is then restored. This kind of use of conditionally compiled testing code is a quick-and-dirty but very effective way to instrument your code to simulate real-world failure conditions.Writing communications code that is carefully designed and code reviewed by peers is a good idea, but is not sufficient; the code must be tested under explicit failure conditions. There is no substitute for real-world testing, but at the same time causing real-world communications failures for every possible case is difficult. The only real alternative is to try to simulate and explore each possible failure path. Given that communications failures will occur and will cause unusual code paths to be exercised the only way to gain confidence in the robustness of your code is to cause the error conditions to occur in a controlled environment and verify that your application recovers well from them. Listing 15.2. Simulating Communications Failure to Test Your Application
Simulate Communications Failures via Server-Side CodeJust as it is possible to insert testing code on a device to simulate error conditions, it is also advisable to stress test your application's communications experience by simulating failures and delays on server-side code. By instrumenting your server code, it is possible to force servers to abruptly terminate a request or to indefinitely hang in the middle of sending a response. In these cases, your mobile application must be able to continue providing its end user a highly responsive and robust experience despite the errors. Testing by simulating errors thrown on the client as well as causing failures and delays on server-side code is a good way to ensure this is the case. For example, in the case of calling a Web service, an easy way to do this is to pass an extra parameter up with the Web service request that indicates an error condition it wants to test. By default, the testing parameter can indicate normal operation and other values for that parameter could indicate that the Web service should throw an error or cause a long delay before sending a response. The client calling the specially instrumented Web service can then intermittently make a request that will generate one of these conditions and make it possible to test its response. Keep Data-Synchronization Progress Transparent to the UserIt is important for user peace of mind to know the status of their data. Just as e-mail programs offer the user a notion of an "outbox" that contains untransmitted mail and printer queues allow the user to inspect "pending jobs," your application should offer a transparent view into the synchronization status of the user's data.There is a balance to be reached in exposing synchronization data to users. The balance is between offering users a clear view into what is going on with their data and interrupting the users with status information they do not care about. By default, when synchronization occurs smoothly the user does not want to be interrupted by modal dialogs that pop up stating, "Data uploaded successfully!" Similarly the user would probably prefer not to be interrupted every 30 seconds by large text flashing in front of them stating "I am attempting to connect to the server again."For this reason, it is often useful to have a small graphical indicator on your mobile device application's screen that lets users know at a glance what the status of their data's transmission is. What kind of graphic to use and where to place it will depend on the user interface guidelines of the particular device you are working with. Some devices such as Pocket PCs offer ample room for a line of text on the screen that gives a summary of the communications status. On other devices, such as Smartphones, screen real estate is limited and only a few words of text or a small icon's worth of information may be possible. At a minimum, the user should be provided with the following summary information:Knowledge of pending communication tasks If there are communications tasks that are stored in the local queue, the user should be made aware of this. It should be possible for the user to drill down into this data and examine which tasks are pending and to manually force a communications attempt when they think it will succeed.Knowledge of the completion of communication tasks The end user wants to know when pending tasks are successfully completed. It is a good idea to have a visual mechanism for informing the user that all is well.Knowledge of problems preventing communication If communications tasks have been attempted on the user's behalf and have failed, the user should be made aware of this. It is possible the user may want to take action based on this information. They may choose to attempt to remedy the communications problem themselves, for example by coming above ground from a subway station and then attempting to synchronize or aborting the pending work because it is no longer relevant. In either case, it is useful for end users to know that the work that was being done on their behalf was not successful. It is, of course, important to give the right level of information to the user; information such as "Patient312.xml transfer pending available socket port 8080 communication to ServerXYZ" is probably meaningless to most users, whereas the information "Upload pending: 'Bob Smith' patient evaluation" is much more useful.The end goal of providing relevant communications status information to the users is to make them an active participant in the communications process. Users feel much more empowered when they have a clear conceptual understanding of what is going on. There may be things the end user can do to help the communications process along, or users may just need to content themselves with an understanding of the successes and failures that are occurring. In either case, the anxiety level of users is lower when they have a transparent view into the communications processes that occur on their behalf. Assume That Data Transmission Rates and Latencies Will VarySome networks operate dramatically faster than others. In addition, networks can become congested as increasing numbers of devices attempt to communicate simultaneously. Latency rates for establishing connections as well as for sending requests and receiving responses will vary. A device operating at the periphery of a network's transmission range may need to attempt multiple sends or receives of packets of information in order to get the data through. For all these reasons, it is important for your application to deal robustly with the reality of variable bandwidth and latency times. Testing in a controlled environment is good and will enable you to make the greatest progress in writing your application, but it is important to plan for the reality that in real-world usage communications rates will vary. It is important to ensure that the application user's experience remains a good one despite all this variability. Implement Needed Communications Security Early in Your DesignBecause mobile devices often communicate over public networks and over wireless channels, it is important to think about what data security is required, how it will be built in to your application, and what effects it will have on the application's deployment and performance. There exist many ways of encrypting communications today; one of the most popular and easy to use is HTTPS and SSL. SSL stands for Secure Sockets Layer, and HTTPS is Web communication built on top of SSL.Fundamentally, your application will need to decide (1) whether it requires secure communication, (2) which networks it will communicate over, and (3) how any needed security will be implemented.In most ways, secure communications works a lot like unsecured communication but with some additional setup steps and operational overhead. When secure communications are required over a public network, your application may need to attach digital certificates to Web requests and verify the veracity of data that is sent back down to it. When communications are encrypted and decrypted, additional computation takes place on both ends of the line and this will have some performance effects; how big these effects are must be tested. Although the core communication code works very similarly to unsecured communication, additional steps are required. These steps take time to design and incur some runtime performance penalty. If your application requires secure communication, it is best to design this in early and test it. As with asynchronous communications, security code is not something you want to try to retrofit into an already completed design. When encrypted communications are required, having the code needed for secure communication built in and performance tested with your application will save you a great deal of redesign down the road. |