Status — stale. Quartz provider removed Phase 0. Hangfire-based replacement (
Jaina.BackgroundJobs.Hangfire) lands Phase 1; this post will be rewritten against that provider.
Background jobs that survive 1M-row reprocessing
The Story
Wednesday 2 AM. A migration script needs to recompute prices for every product in the catalog (1.2M rows). Naive code: a single await foreach on the DB cursor. Halfway through, the SQL connection times out. The job crashes. You restart and it starts from row 1. Three retries later, the on-call engineer just SSH'd into a worker box and is running it manually. Bad night.
The fix: break the work into idempotent chunks scheduled by a durable job runner that resumes after a crash.
Naive approach
public class RecomputePricesJob : IHostedService
{
public async Task StartAsync(CancellationToken ct)
{
await foreach (var p in _db.Products.AsAsyncEnumerable().WithCancellation(ct))
p.Price = ComputeNewPrice(p);
await _db.SaveChangesAsync(ct); // 1.2M rows in one transaction!
}
}
What breaks:
- One transaction → DB locks → other writes time out
- One process → no parallelism, no resume on crash
IHostedServiceruns once per process; deploying a new version skips it
Jaina solution
Two layers:
- Outbox to fan out one job-per-chunk (using the existing transactional outbox)
Jaina.BackgroundJobs.Quartzto consume each chunk with retry + crash recovery
// Trigger: schedule a kickoff that fans the work into 1,000-row chunks
public sealed class ChunkRecomputeJob : IBackgroundJob<ChunkPayload>
{
private readonly AppDb _db;
public ChunkRecomputeJob(AppDb db) => _db = db;
public async Task ExecuteAsync(ChunkPayload payload, CancellationToken ct)
{
var batch = await _db.Products
.Where(p => p.Id > payload.AfterId)
.OrderBy(p => p.Id)
.Take(payload.ChunkSize)
.ToListAsync(ct);
foreach (var p in batch) p.Price = ComputeNewPrice(p);
await _db.SaveChangesAsync(ct);
}
}
// Kick it off:
await scheduler.ScheduleAsync<ChunkRecomputeJob, ChunkPayload>(
new ChunkPayload(AfterId: lastProcessedId, ChunkSize: 1000));
Source: IBackgroundJobScheduler.cs, QuartzBackgroundJobScheduler.cs.
DI:
services.AddJainaQuartzBackgroundJobs();
services.AddTransient<IBackgroundJob<ChunkPayload>, ChunkRecomputeJob>();
Recurring? Cron expression:
await scheduler.ScheduleRecurringAsync<NightlyDigestJob, NightlyArgs>(
name: "nightly-digest",
cronExpression: "0 0 2 * * ?", // 2 AM daily
payload: new NightlyArgs(...));
Happy path
[02:00:00] queued chunk afterId=0 size=1000
[02:00:02] processed chunk: 1000 rows updated
[02:00:02] queued chunk afterId=1000 size=1000
[02:00:04] processed chunk: 1000 rows updated
...
[02:38:51] queued chunk afterId=1199000 size=1000
[02:38:53] processed chunk: 982 rows updated (last batch — < 1000)
[02:38:53] no more rows; recompute complete
1.2M rows in ~38 minutes, never holding more than 1k rows in a transaction.
Error scenarios
1. Worker crashes mid-chunk
Quartz with persistent JobStore (SQL or RAM-with-failover) marks the job as not-acked. Another worker (or the same one after restart) picks it up. The chunk is idempotent because we filter Id > AfterId — re-running it from the same payload either finishes the unfinished portion or no-ops.
2. One row in the chunk has bad data, throws
Default Quartz behaviour: retry the job. Configure MaxRetries then move to a poison queue (manual cleanup). Or write the chunk handler to skip the bad row and continue (always log; never silently swallow).
3. Two workers grab the same chunk
Quartz uses a row lock in the JobStore — only one worker executes at a time. With the in-memory store, run a single worker; with the SQL JobStore, multi-instance is safe.
4. Schedule drift after time zone change
Use UTC cron expressions. Always.
5. Recurring job overlaps with itself
The previous tick is still running when the next fires. Quartz default = stack them up. Use [DisallowConcurrentExecution] on the job class to skip overlapping ticks.
6. Job leaks DI scope
Quartz job instances are constructed per-trigger. Use IBackgroundJobScheduler's scope-aware adapter (handled by Jaina.BackgroundJobs.Quartz automatically) so each ExecuteAsync runs in a fresh DI scope. Don't keep a static DbContext — that's the leak.
What you'd see in production
Useful metrics:
jaina.backgroundjobs.execute.durationhistogram by job namejaina.backgroundjobs.failedcounter by job name + exception type- Quartz built-in metrics:
quartz.jobs.scheduled,quartz.triggers.fired
Alert on jaina.backgroundjobs.failed{job="recompute-prices"} > 0 for > 5m.
Trade-offs & gotchas
- Idempotency is non-negotiable. A retried chunk must not double-update rows. Filter by
> lastIdor store an "already processed" flag. - Cron resolution is one minute in most JobStore configs. For sub-minute jobs use one-shot scheduling.
- Persistent JobStore is required for multi-instance. RAM JobStore loses everything on restart.
- Long-running jobs block other jobs in the same trigger group. Configure thread pool size, or use separate triggers per concern.
- Outbox + BackgroundJobs is a powerful combo: write the kickoff message in the same transaction as the domain change, then fan out chunks asynchronously. No coordination, no dual-write.
Try it yourself
dotnet run --project samples/JainaShop/JainaShop.Notifier
# Queue an SMS dispatch (deduped via Inbox + scheduled via Quartz)
curl -X POST http://localhost:5104/events/order-placed \
-H "Content-Type: application/json" \
-d '{"orderId":"d4ad","sku":"WIDGET","quantity":3,"customerPhone":"+15551234"}'
# Watch logs for the SMS dispatch ~1s later
# Repeat the same call — Inbox dedups, no second SMS
Further reading
- Source:
Jaina.BackgroundJobs.Quartz - Companion posts: Outbox, Inbox dedup
- Quartz.NET docs