Lightroom Classic 9.2: "Lightroom encountered problems reading or writing from disk when attempting to repair..."

  • 1
  • Problem
  • Updated 1 month ago
LR Classic 9.2 (Build 202001311240-2d026470) on Windows 10 (up-to-date).

Steps:

1) Worked on many photos and closed LR with no issues (~400MB catalog with 60,000 photos)

2) Following day Lightroom refused to open catalog ("LR catalog cannot be opened because it is not valid")

3) Selected "Choose a different catalog", re-selected the catalog file, enabled the "Test integrity" checkbox and then [Open]

4) Message box: "The Lightroom catalog is corrupt and cannot be used or backed up until it is repaired". Clicked on [Repair Catalog]

5) After a few seconds, message box: "LR encountered problems reading or writing when attempting to repair catalog"

6) A curious message more applicable to medicine than deterministic software programming appears, something along the lines of "Keep trying, it may work next time"?!

I read several similar reports and I can safely exclude:
- Disk drive failure (no disk errors reported - moved lrcat file from HD to SSD within same machine, and on a second machine: same error)
- Disk drive lacking space (240 GB free)
- Messed up LR folder configuration 
- Accidental shutdown during LR closure 
- Virus or other malware
- Read/write permission issues (user account already had full control on lrcat file, folder and sub-folders, added file and folder ownership, extended Full Control to the Administrator Group, ran as Administrator, etc.)

I realize LR is quite a shaky piece of software with serious architectural issues.

I also realize that the error message about "encountering problems reading or writing" is at best a poorly designed exception handling (returning a generic R/W failure when the exception may be caused by a more specific error) and most likely hiding more serious issues when attempting to repair catalog files. 

I also appreciate that I may get the quite common: "you should have an effective backup strategy in place, just use a backup catalog", but that should be neither acceptable nor ethical (for Adobe) as an answer. Yes I do have relatively recent backups and yes unexpected things can happen with computers, but the number of users complaining about weird catalog corruption cases should trigger someone high enough in Adobe to re-architect at least that portion of the product. After all, it is not free software. 

I only hope that someone in Adobe can look into the lrcat file that I can make available, analyze what is corrupted, understand how it happened, and fix the corresponding code.

Thanks
Photo of John Georges

John Georges

  • 4 Posts
  • 0 Reply Likes

Posted 2 months ago

  • 1
Photo of Dan Hartford Photo

Dan Hartford Photo

  • 430 Posts
  • 182 Reply Likes
Many people suggest letting LR rebuild the preferences files on start up as a solution to a plethora of strange problems.  Give this a try as it is the least painful.  

Per chance at the end of step 1, when you shut down LR, did you allow it to take a backup?

If Yes, rename your current catalog to something else (e.g.  Lightroom Catalog BAD.lrcat).  then unzip the backup catalog and place it in the same folder as the bad catalog.  Re-try

If No, do you have other, 3rd party backup of the catalog taken after step 1 but before the failed attempt to open the catalog.  If so try restoring one of those.  It's a long shot as barring anything not disclosed in your post, the corruption was most likely to have occurred at shut down of LR so a backup taken later would also be corrupted.  But it's worth a shot.

If no backups from that time period are available or work,  Sometimes corrupted catalogs can be repaired outside of LR.  I don't have the specifics, but look at www.LightroomQueen.com  forums and seach for corrupt catalog for steps and if you're a member sometimes you can send the catalog to them and they can run some fix tools on it.

Other than that, I'm out of idea.
Photo of John Georges

John Georges

  • 4 Posts
  • 0 Reply Likes
Thank you Dan, I've followed your suggestion and with some luck I turned the corrupted catalog into a new working one, with no data loss. 

Although not explaining the root cause, I hope the steps below may be useful to others:

1) Read that LR uses SQLite as internal DBMS, hence .lrcat files are SQLite DBs; 

2) To check the content of the corrupted .lrcat file I installed the ODBC driver for SQLite and configured an ODBC data source. Using Microsoft Access I opened the .lrcat data source and linked all tables. 

3) Browsed the 100+ LR tables, for the vast majority readable. For a couple of them Microsoft Access would list the presence of the records but not their content, throwing a data inconsistency error. Having nothing to lose I tried deleting the unreadable records, and the Access error disappeared. In the process I also noticed two system tables ("sqllite_stat1" and "sqllite_stat3") containing an error label about data inconsistency. Saved the changes and closed Access.

4) Installed SQLite including the 32bit SQLite3 command-line interface. Searched for a SQLite3 command to check data integrity but only found [.info] and [.clone]. I opened the .lrcat file in SQLLite, ran [.info] and then ran the [.clone] command, generating a new lrcat file. Screenshot below. 

5) The size of the cloned .lrcat file is 40 MB less than the original ~400MB. To my partial surprise it was opened without issues by LR, and all my latest changes were there. 

A portion of the SQLite3 command window below. 
Hope this helps.

--
E:\Pictures>sqlite3 BAD.lrcat
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.

sqlite> .dbinfo

database page size:  4096
write format:        2
read format:         2
reserved bytes:      0
file change counter: 58
database page count: 109283
freelist page count: 0
schema cookie:       974
schema format:       4
default cache size:  0
autovacuum top root: 0
incremental vacuum:  0
text encoding:       1 (utf8)
user version:        0
application id:      0
software version:    3030001
number of tables:    115
number of indexes:   262
number of triggers:  20
number of views:     0
schema size:         51757
data version         2

sqlite> .clone newBAD
Adobe_variablesTable... done
Adobe_variables... done
AgFolderContent... done
Adobe_AdditionalMetadata... done
AgMetadataSearchIndex... done
[...]
Photo of Bill

Bill

  • 117 Posts
  • 13 Reply Likes
You are blaming Adobe for what is a hardware error.
 
If a write to a hard drive garbles the data, there is no way to know that it happened (SMART is basically a joke) unless Adobe were to implement a read back after write and compare. This used to be an optional feature in Windows but almost nobody used it because of the performance hit. There is a simple parity written with the data but it is not ECC so that all it can do (at best) is report a hardware read error. It cannot recover from the error. This is what Lightroom is reporting to you, garbled data from the drive. 
 
There are high integrity SSD drives available in enterprise configurations but they can't be used in a typical desktop or laptop computer due to having a different interface and costing as much as a really nice used car. And they still need at least RAID-5 for minimal data integrity. 
 
Adobe does not need to re-architect the catalog unless we want Adobe to implement write-read-compare after all writes AND we already complain non-stop about Lightroom performance. 
 
What can we as users do?
  1. Now: use at least a RAID-5 disk configuration so that bad data can be recovered from the other disks. It is normal that when a write error occurs, it only happens to a single drive in a RAID array. Upon reading back, the RAID device will see the error, mark that drive as failed and recover the data from the other drives. A relatively inexpensive solution is to use a MOBIUS 5-drive external case which is populated with a minimum of 3 drives and set to RAID-5. 

  2. Long term: we can ask Adobe to implement a write-read-compare option. Almost nobody will use it and folks will still complain that Adobe hasn't implemented any magic to recover from the hardware errors. 

Photo of Gary Rowe

Gary Rowe

  • 126 Posts
  • 40 Reply Likes
If I didn't see LR mess up the copying, moving and renaming of files almost every day I use it, I might believe you ... and then I see how you have avoided his point and gone down a rabbit hole about read-after-write and compare, saying that is the only thing that could be done. What a small mind.
Photo of Bill

Bill

  • 117 Posts
  • 13 Reply Likes
Then you are having serious hardware errors. I've been using Lightroom since Version 2 and have never had a file error. You are really terribly ignorant about how hardware works, just wanting to blame Adobe for your mistakes. 
Photo of Gary Rowe

Gary Rowe

  • 126 Posts
  • 40 Reply Likes
Who says I'm having *any* 'serious hardware errors'? That is just an assumption.

I've been using LR since the first beta, across all versions and on many different PCs over that period, and attempts at move and rename have always been a chance for LR to mess it up, but I never thought there were any hardware errors occurring (not even un-serious ones ;)   I've always thought it was probably something related to poor error handling of IO requests, possibly related to other applications running on my systems.

And yes, I'm not that well up on hardware, I rely on drivers etc. to handle that.
Hardware has kept changing over my career, and I now happily leave that area of expertise to others; I find the interfaces more interesting, and I know only too well how complex it can be to get a program to accurately report the errors it has encountered and even moreso to handle all of them gracefully.
But, when things often go pear-shaped with a piece of software I'm using, I don't just go assuming it's a hardware error - it is so much more likely to be a software issue, and it has almost always turned out to be just that.

(Also, in my book, properly-written software should protect the user from their 'mistakes')
Photo of John Georges

John Georges

  • 4 Posts
  • 0 Reply Likes
Thank you for all your replies, esp. the one about recovery outside of LR using mySQL which I just found. Will give it a try. 

And no, I didn't ask for a backup at step 1 because I had one from some days ago. 

Regarding HW, this happened on a 2-week old 1GB Samsung SSD which looks perfectly working otherwise and reporting no issues with CHKDSK. But I should have added that since I started using LR six months ago (on two other machines and reinstalling LR twice) this was the third corrupted catalog (same ~60,000 NEF, TIFF and JPG photos in a flat ~700 folder hierarchy). The probability that 3 cases of corrupted catalogs in 3 months on 3 different disks were all due to HW I/O is so small to be negligible. Hence my considerations.  

I agree RAID-5 would mitigate potential I/O errors, but would also confirm to many that LR is a troublesome member of the software family. Imagine if in the last 35 years 1 billion MS Office users would have been told to use RAID-5 to overcome HW I/O errors when saving PPT or DOC on their PC or Mac. I don't even recall when last I had to recover a file due to HW I/O, probably decades ago. Perhaps worth noting that my first lines of code were in Assembler on an IBM PC in 1982. 

One last comment: as new LR user I am surprised by the number of forum posts, sites, YouTube channels and people making a living out of teaching how to fix errors or just barely "use" LR. It is certainly powerful and comprehensive and most users would adapt to the weirdest of the UX to get things done. Regardless, shortcomings like the lack of accessibility considerations (tiny little Develop cursors anyone? or the menu chaos so 1990s and causing so many "where to find x in LR") or the poor integration with the underlying operating system, with basic file mgmt and window operations being overridden for no apparent reason and causing a UX nightmare, make me think that there may be something more than a quite unlikely HW I/O error. I may be wrong. 

Cheers and again, thanks everyone. 
Photo of Victoria Bampton - Lightroom Queen

Victoria Bampton - Lightroom Queen, Champion

  • 5494 Posts
  • 2192 Reply Likes
Most people never have a catalog get corrupted in their life. While you will find cases on the web, almost every single case I've ever seen has been due to a hardware fault, and even then, they're very rare compared to the number of Lightroom users.

3 different catalogs all corrupted in 3 months definitely suggests hardware, just maybe not the disk.
Photo of John Georges

John Georges

  • 4 Posts
  • 0 Reply Likes
>>3 different catalogs all corrupted in 3 months definitely suggests hardware

Sorry, not if the 3 instances occurred on 3 machines not sharing any HW component. 

The resulting probability of an HW I/O issue happening on 3 different machines is the product of the 3 individual probabilities, already very low. 

Photo of Bill

Bill

  • 117 Posts
  • 13 Reply Likes
RAID-5 = no problem
SLED = occasional corruption
 
Software has no way of knowing which it is writing to. 
 
Proof that this is a hardware problem. 
 
That you wrote a few lines of Intel assembler does not make you a hardware expert. In fact your whole diatribe proves that you are unskilled in this area. Whereas I've been addressing hardware reliability and performance issues since 1965, have written IOS software for computers that didn't even have an operating system. 
 
The reason that enterprise class systems have very complex and expensive RAID storage systems is just because I/O is the least reliable component of a system. It is not just the final device, HD or solid state storage, that is the problem; the whole I/O stream (including the operating system) is the cause of data corruption errors. Companies with Petabyte size storage systems get data errors on a daily basis, sometimes hundreds to thousands if they are large enough. They don't blame the software because the software CAN'T cause the problems. 
 
Sure data corruption can be caused by software but NOT where errors are reported by the hardware when trying to read the data. A piece of software can damage the application based logical consistency of the data but this CANNOT result in a hardware error. So if the hardware is reporting that something cannot be read from a drive or array then the hardware failed when writing to the device. If the software was supposed to write XYZ and instead wrote ABC, that will not result in a hardware read error. All bit strings are completely valid to the hardware. 
 
So can Lightroom corrupt its catalog, of course it can. But that will show up as garbled or lost information inside Lightroom, not the hardware reporting read failures. And when pressed, it will usually be found that the computer was powered off without a clean shut down of the application. It's called User Error. 
 
Photo of Todd Shaner

Todd Shaner, Champion

  • 1769 Posts
  • 589 Reply Likes
The reason that enterprise class systems have very complex and expensive RAID storage systems is just because I/O is the least reliable component of a system. It is not just the final device, HD or solid state storage, that is the problem; the whole I/O stream (including the operating system) is the cause of data corruption errors.
 Bill is correct. Enterprise systems (i.e. servers & workstations) also use Uninterrupted Power Supply (UPS) backup. AC voltage fluctutations (higher or lower) are common and can cause memory and disk data errors. In addition DRAM based system memory is prone to soft errors, which can be partially prevented by using error check and correction (ECC) memory systems. The vast majority of users most likely do not have UPS backup power or ECC memory so it's a disaster waiting to happen!

While at Interdata back in the 1970s I designed one of the first MOS memory systems so well aware of these system volatilities. ECC memory can only correct single data bit errors so still prone to double-bit errors. To help prevent this we implemented memory scrubbing, which is a low-priority background operation that reads all memory locations and corrects single bit errors. Since most memory data corruption is due to correctable soft errors this background operation significantly improved system reliability.
(Edited)
Photo of Todd Shaner

Todd Shaner, Champion

  • 1769 Posts
  • 589 Reply Likes
I'd like to add a few suggestions to help prevent LR catalog and other data corruption due to system memory volatility.


You can implement a very simple form of memory scrubbing by shutting down your system every night and disconnecting it from the AC outlet using the switch on a power strip. This helps prevent data errors and system damage by removing AC power voltage spikes, dropouts, brownouts, and total power outages for ~12-16 hours. This also refreshes the OS and all applications in system memory every day, which helps prevent soft memory errors. Even if you have a UPS on your system it is still vulnerable to voltage spikes and most have no more than 15-30 minutes of backup time. Long power outages will cause a system power failure with potential of corrupting both system memory and disk storage.

I've been doing this for over 20 years and have experienced very few instances of data corruption or other system issues.
(Edited)