Durable Engineering at Basic Capital
Using Temporal with Kotlin to make building joyful
Introduction
Most server applications start simple: a request comes in, the server does some work on a thread, the server sends back a response. Eventually, one inevitably runs into a class of engineering problem that doesn’t fit this model: a process that spans minutes, hours, or days. A payment that needs to wait for external settlement. An onboarding flow that blocks on identity verification. A multi-step order that can fail at any point and needs to pick up exactly where it left off.
The instinct is to reach for queues, cron jobs, and state machines stitched together with database flags. We’ve all built these systems. They work, right up until they devolve into piecing together state from databases and queues as you joylessly debug why something happened twice, or why a customer got stuck in a half-completed state that your recovery logic didn’t account for. A whole class of engineer was created in the 2010’s to reinvent the solution to this problem at every company.
At Basic Capital, “quality of life” for our technical staff is an important metric for success. Building at Basic Capital should inspire joy, it should feel like playing Factorio, and not be a death-by-thousand-chores slog.
Temporal offers a different model: write your long-running process as a straightforward function, and let the infrastructure handle durability. Your code reads like a simple procedural program, but it survives process restarts, network failures, and deployments without losing progress.
Our inaugural blog-post walks through how we set up Temporal in a Kotlin backend, the architectural insight that made it click for us, how it has yielded compounding returns for our engineering and ops teams, and how it’s laid the foundation for agentic AI products.
The Mental Model
The key insight powering Basic Capital’s foundational systems: a Temporal workflow doesn’t need to contain business logic. It can simply coordinate it.
In our architecture, the application backend is a set of gRPC services that owns all business logic, database access, and domain rules. Temporal workflows sit alongside them as orchestrators, calling into those services, waiting for external events, and ensuring that multi-step processes complete reliably.
This means:
Application services remain the single source of truth for business logic
Workflows are lightweight coordination layers, easy to read and reason about
You get durability without polluting data models with orchestration artifacts
Setting it up with an example
Suppose you wanted to build a simple order fulfillment workflow at Basic Capital: an order is created, waits for payment confirmation from an external provider, and then gets fulfilled. If the payment succeeds, we ship it, otherwise we cancel the order.
Step 1: Define the Workflow Interface
Workflow interfaces live in a shared library so both the application server (which starts workflows) and the worker (which executes them) can reference them.
@WorkflowInterface
interface OrderFulfillmentWorkflow : RunnableWorkflow {
@WorkflowMethod
fun run(orderUuid: String)
@SignalMethod
fun paymentCompleted(success: Boolean)
}A few things to note:
@WorkflowInterfaceand@WorkflowMethodare Temporal annotations that define the entry point@SignalMethoddefines a method that external systems can call to push data into a running workflow
Signals are what make Temporal workflows genuinely reactive. Instead of polling a database for state changes, your application backend pushes events directly into the running workflow.
Step 2: Define the Activities
Activities are where side effects happen. In our architecture, they’re thin wrappers around gRPC calls:
@ActivityInterface
interface OrderFulfillmentSteps : BasicCapitalActivity {
fun getOrder(orderUuid: String): Order
fun reserveInventory(orderUuid: String)
fun fulfillOrder(orderUuid: String)
fun cancelOrder(orderUuid: String, reason: String)
}
class OrderFulfillmentStepsImpl(
private val orderClient: OrderServiceCoroutineStub,
) : OrderFulfillmentSteps {
override fun getOrder(orderUuid: String): Order {
return runBlocking {
orderClient.getOrder(
GetOrderRequest(orderUuid = orderUuid)
).order
}
}
override fun reserveInventory(orderUuid: String) {
runBlocking {
orderClient.reserveInventory(
ReserveInventoryRequest(orderUuid = orderUuid)
)
}
}
override fun fulfillOrder(orderUuid: String) {
runBlocking {
orderClient.fulfillOrder(
FulfillOrderRequest(orderUuid = orderUuid)
)
}
}
override fun cancelOrder(orderUuid: String, reason: String) {
runBlocking {
orderClient.cancelOrder(
CancelOrderRequest(orderUuid = orderUuid, reason = reason)
)
}
}
}Notice what’s not here: no database queries, no business rules, no validation. All of that lives in the gRPC service implementation. The activity is just a remote procedure call.
Step 3: Implement the Workflow
Now the satisfying part, the workflow implementation reads like pseudocode:
class OrderFulfillmentWorkflowImpl : OrderFulfillmentWorkflow {
private val steps = getActivityStub(OrderFulfillmentSteps::class)
private var paymentSucceeded: Boolean? = null
override fun run(orderUuid: String) {
// Reserve inventory immediately
steps.reserveInventory(orderUuid)
// Wait for payment provider callback — could be seconds or hours
Workflow.await { paymentSucceeded != null }
if (paymentSucceeded == true) {
steps.fulfillOrder(orderUuid)
} else {
steps.cancelOrder(orderUuid, "Payment failed")
}
}
override fun paymentCompleted(success: Boolean) {
this.paymentSucceeded = success
}
}
Workflow.await { paymentSucceeded != null } is the magic line. The workflow suspends here, not by burning a thread, but by persisting its state and shutting down. When a signal arrives via paymentCompleted(...), Temporal replays the workflow history, restores the state, evaluates the condition, and resumes execution.
If the worker crashes while waiting, nothing is lost. If you deploy a new version, it picks up where it left off. The workflow could wait for five seconds, five days, or even five weeks — the outcome is the same.
Step 4: Configure Task Queues
Activities and workflows are grouped into task queues, which map to pools of workers. We created our own abstraction on top of this to make it especially straightforward for engineers (or Claude Code).
You can configure multiple task queues, each with their own configured parallelism and rate limits.
Step 5: Wire it all up
On the application side, starting a workflow is a one-liner wrapped in a thin client we created for convenience:
class BasicCapitalTemporal(private val workflowClient: WorkflowClient) {
// Idempotent workflow creation
fun <T : RunnableWorkflow> startNewWorkflow(
workflowId: String,
workflowProcessor: KClass<T>,
workflowInvocation: (T) -> Unit,
): WorkflowMetadata {
val workflow = workflowClient.newWorkflowStub(
workflowProcessor.java,
WorkflowOptions.newBuilder()
.setTaskQueue(workflowProcessor.getTaskQueue())
.setWorkflowId(workflowId)
.setWorkflowIdConflictPolicy(USE_EXISTING)
.build(),
)
val execution = WorkflowClient.start { workflowInvocation(workflow) }
return WorkflowMetadata(execution.workflowId, execution.runId)
}
// Signaling existing workflows
fun <T : RunnableWorkflow> withExistingWorkflow(
workflowId: String,
workflowProcessor: KClass<T>,
fn: (T) -> Unit,
) {
val workflow =
temporalClient.newWorkflowStub(
workflowProcessor.java,
workflowId,
)
fn(workflow)
}
}The USE_EXISTING conflict policy makes workflow creation naturally idempotent. Additionally, as a convention, we use our business objects’ unique identifiers as their workflowId’s.
From a gRPC endpoint, kicking off the workflow looks like:
// Inside your gRPC service handler
val order = orderDao.createOrder(request)
temporal.startNewWorkflow(
workflowId = order.uuid,
workflowProcessor = OrderFulfillmentWorkflow::class,
) {
it.run(order.uuid)
}
// Inside the event handler for an external payment completion webhook event
temporal.withExistingWorkflow(
workflowId = orderUuid,
workflowProcessor = OrderFulfillmentWorkflow::class,
) {
it.paymentCompleted(success = true)
}That’s it. The webhook handler sends a signal, the workflow wakes up and continues.
Configuring Retry and Timeout Behavior
The Temporal SDK allows you to configure retry policies and timeouts:
private val retryOptions = RetryOptions.newBuilder()
.setInitialInterval(Duration.ofSeconds(5))
.setMaximumInterval(Duration.ofDays(1))
.setBackoffCoefficient(2.0)
.setMaximumAttempts(18)
.build()
private val defaultActivityOptions = ActivityOptions.newBuilder()
.setRetryOptions(retryOptions)
.setStartToCloseTimeout(Duration.ofMinutes(5))
.build()The appropriate configuration is situationally dependent, but the arrangement that works for us today is an exponential backoff from 5 seconds to 1 day and 18 max attempts. A transient failure gets retried aggressively at first and gracefully backs off, covering roughly 5 days of retry window.
Why This Architecture Works
The pattern of “Temporal as orchestrator, gRPC services as executors” has a few properties that compound over time:
Workflows stay simple - Because they only coordinate calls and manage state transitions, most of our workflows are under 50 lines. They’re easy to review, easy to test, and easy to reason about when something goes wrong.
Business logic doesn’t scatter - Every domain operation lives in exactly one place: the gRPC service that owns it. Temporal never needs to know how inventory reservation works—it just knows to call something like
reserveInventory(), and handle success or failure.The system is observable - Temporal’s UI shows every running workflow, its current state, its full event history, and every signal it’s received. When a workflow is stuck, you can see exactly which activity failed and how many times it’s been retried.
Signals replace polling - Instead of workflows polling a database every 30 seconds to check if payment completed, our webhook handlers can simply send a signal directly to the workflow.
The Bigger Picture
The value of Temporal isn’t that it does something you can’t do with queues and state machines. It’s that it lets you express complex, long-running processes in code that looks boring: sequential, readable, and obviously correct. In production systems that handle real money, boring is exactly what you want.
We run 70+ workflow types in production, covering everything from account onboarding to trade settlement to SPV incorporation. Each one follows the same pattern: a workflow interface with signals, an activity interface that wraps gRPC calls, and an implementation that reads like a step-by-step runbook.
Think of it like automating a human sitting and stringing together API calls to run your business. This mental model is particularly powerful, as it has laid the foundation for us to build Agentic operations, allowing us to use natural language to orchestrate complex research and processes on top of our foundational platform. We’re excited to share more about this very soon.
At Basic Capital, we’re building a vertically integrated 401(k) and IRA product, a native record keeping system, brokerage & trading systems, credit origination systems, the world’s only Retirement Mortgage, and agentic plan administration. Our engineering challenges span financial systems, UI/UX, 3rd party integrations and partner API’s, data engineering, and AI infrastructure. Our bedrock makes building all of these things feel downright joyful, and we’re just getting started. If you’re interested in joining, we’re hiring.



