Spark Streaming checkpointing and Write Ahead Logs

Write-ahead logging

SQL Spelling maintains a buffer cache into which it does data pages when data must be cast.

Write-ahead logging

SQL Server Transaction Log Architecture and Management

When you enable write ahead logs, everything within the forEachRDD method needs to be serializable, which wasn't well documented. This resulted in classes getting serialized that I didn't assume would be. Write ahead logging provides a method of applying the Atomicity and Durability rules of ACID.

To overcome this data loss scenario, Write Ahead Logging (WAL) has been introduced in Apache Spark With WAL enabled, the intention of the operation is first noted down in a log file, such that if the driver fails and is restarted, the noted operations in that log file can be applied to the data.

The write-ahead log is sequence of append-only files containing all the write operations that were executed on the server. It is used to run data recovery after a server crash, and can also be used in a replication setup when slaves need to replay the same sequence of operations as on the master.

In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.

The changes are first recorded in the log, which must be written to a stable storage before converted unto a disk.

