Service Fabric To-Do
Service Fabric Documentation Done Right
When it comes to Service Fabric, the documentation can be quite light. Also, the Microsoft documentation tends to organize itself around the technology rather than the problems it solves. Since I’m a firm believer that all complaints should be productive, I’ll be maintaining this page as a jumping off point to a series of detailed posts that explore the lower level details of Reliable Services in Service Fabric.
Doing it Better
Making Service Fabric Approachable Through Relevant Documentation of Real-World Scenarios
If not approached with care, a Service Fabric Reliable Services implementation can quickly start to feel like a tent city rather than a thriving collection of interrelated services. I hope to address many of the things that the documentation misses in the sections below. While there is a whole basket of things for infrastructure developers to focus on, this page is mostly committed to documenting the journey of the application developer.
Why Read On?
Reliable Services in Service Fabric set it apart from other microservice platforms. Stateful Services provide very low latency access to storage with consistency guarantees across replicas. It also provides singleton, named, or ranged partitions to distribute the computation, memory, and storage loads across multiple nodes.
The IReliableStateManager Interface provides an abstraction over the storage layer that grants access to the underlying storage in a thread-safe ATOMic fashion. Using the interface (an instance is provided by the StateManager property of the StatefulService base class), developers can focus on business logic, leaving the storage concerns to the Service Fabric API and runtime.
The Complete Story
Advice From the Trench
The lists in the sections below are works-in-progress, but capture the benefits and pain points of Service Fabric Reliable Services as I see them. It’s important to remember that Service Fabric is at its core an infrastructure solution with a “pretty good” API that allows developers to take advantage of the “native” functionality exposed through Reliable Services. That is to say that using Service Fabric Reliable Services properly necessarily requires developers to have to interact with the hosting platform at a lower level than they may be used to.
Over the following months, I’ll be filling in the bulleted items below in an order that makes sense to me. If you would like to see me prioritize a post, or would like to see me address a different item not listed, please tweet #ServiceFabricToDo to @AntiArchitect.
As Far as the Microsoft Documents Take It
After a detailed reading of Microsoft's documentation site for Service Fabric, you will have a surface-level familiarity with all of the reasons to use Service Fabric listed below:
Health reporting and monitoring enables failover, recovery, and better deployment
Service-to-service communication using Remoting
Partitioning of data across nodes, and partition discovery using a key
Storage and retrieval of one or more records in an ATOMic transaction
Backup and restore of state from the local file system
Transactional and concurrency guarantees
Replication of data and (nearly) transparent failover
Orchestration of service-to-service dependencies
An event model for notifications of Actor events to other services
A single-threaded programming model that removes the need to program for concurrency
Advanced Documentation for Real-World Application of Concepts
The above is clearly the “happy path” for implementation, however. These non-happy-path scenarios should be accounted for in any enterprise implementation, and will often have far-reaching consequences if implemented incorrectly or not at all.
Planning data for partitions (designing data to be evenly distributed across a range while also being predictably locatable)
Planning partitions for compliance when developing multi-tenant SaaS
Making records in a Stateful Service locatable (indexed, and searchable).
Managing backups using cloud storage
Designing the application for external consumption of data
Designing for robust concurrency, consistency, and transaction integrity for business transactions that span services
Deployment and Versioning
Managing multiple versions of a deployed service
Designing a service to accept configuration changes without restarting
Planning for communication using service remoting, asynchronous offloading, and actor events
Securing Service Fabric Remoting
Some Things Documentation Can't Fix
In my opinion, Service Fabric Reliable Services fall completely short of its goals in the following areas. It looks like Microsoft is making some movement toward addressing many of these, but in the absence of a finished (generally available) solution, I will maintain these items. As I document these, my goal is to propose a solution both in terms of a short-term fix as well as long-term design.
Replicating data between regions
Scaling out or rebalancing partitions (changing partition count)
API design that focuses on composition rather than inheritance and convention
All of the places you have to manage configuration (hint, too many)
White Papers on "Doing it Wrong"
Below are a list of things that developers new to Service Fabric Reliable Services often get wrong (the documentation provides little guidance on patterns).
A map of your service-to-service communications should look not like a plate of Spaghetti; it should look like spears of asparagus.
Properly implementing CancellationToken in services is not only important, it's actually easier than you think.
As wrong as you may feel the API designers got it, don't deviate from the path, and whatever you do, don't build framework over the Service Fabric API!