Lightroom Classic: Needs tools to validate catalog and all content

  • 3
  • Idea
  • Updated 9 months ago
  • (Edited)
Lightroom was designed as a tool for gathering, organizing, and storing large collections of photographs.  Many photographers, like me, use it to store their entire collection. It is thus imperative to check the collection integrity from time to time, perhaps monthly.

Lightroom Classic CC has menu items to "Validate DNG files" and to "Find all missing photos".  These are a good start - but are not a complete solution.

LR really needs "Validate all photos" and "Validate Catalog" menu items. 

The "Validate all photos" should check the availability and integrity of all items in the catalog - not just dng files. My catalog includes jpg, tif, png, mov, mp4 in addition to dng files.  This feature is simple to implement - just compute an MD5 hash of the file and record it in the catalog; when later validating, recompute the hash and compare with what is recorded in the catalog.

The "Validate Catalog" feature should run internal consistency checks on the catalog itself.  Presently (v8.1) there is no such feature, unless it is a precursor to "Optimize catalog...".
Photo of DAVID KOTZ

DAVID KOTZ

  • 12 Posts
  • 1 Reply Like

Posted 10 months ago

  • 3
Photo of dmeephd

dmeephd

  • 300 Posts
  • 79 Reply Likes
Agree completely.  Optimization does NOT perform most of the tasks of a database integrity and data validation utility.  These do indeed exist for SQLite, but they are typically not open source and therefore not free.  So don't hold your breath expecting Adobe to add them any time soon.

There are commands built into SQLite, such as "sqlite>PRAGMA integrity_check;"; however, I am unsure if Optimization performs this task or if this is all that Optimization does.  I suspect the latter.

A utility such as Pragma performs much more.  Pragma does an integrity check of the entire database.  It looks for out-of-order records, missing pages, malformed records, and corrupt indices.  If any problems are found, then strings are returned (as multiple rows with a single column per row) which describe the problems.  At most integer errors will be reported before the analysis quits.  The default value for integer is 100.  If no errors are found, a single row with the value “ok” is returned.  If no "ok" is returned, then other commands must be used to reindex the database.

Pretty sure LR Optimization does none of this.  It certainly cannot check for referential integrity as SQLite does not support any level of Referential Integrity beyond the lowest level; i.e. primary keys.  Hence all the catalog issues users report, especially as the catalogs grow larger.

However, obtaining these tools on your own and using on your LRcat is a recipe for disaster—I know, I've tried—as LR adds some proprietary header info to the database file and the returned and corrected catalog cannot be recognized or opened by LR.  Oops.

Photo of Roelof Moorlag

Roelof Moorlag

  • 212 Posts
  • 63 Reply Likes
Yes, i'm with you. Lightroom should get better validation options but there are some workarounds:

Linwood Ferguson wrote a plugin (LRValidate) that does validation on all files in Lightroom: https://archive.codeplex.com/?p=lrvalidate

And for checking the integrity of the Lightroom database itself i use SQLite Expert Personal. I wrote a (dutch) blog about it: https://digitaalfotobeheer.blogspot.com/2018/05/lightroom-back-up-testen.html



(Edited)
Photo of DAVID KOTZ

DAVID KOTZ

  • 12 Posts
  • 1 Reply Like
Thanks!  LRValidate looks useful, assuming it is still compatible.  (It is 4 years old and may be limited to Windows.). 

Still, this should be a basic function built-into LR itself.
Photo of Jim Wilde

Jim Wilde, Champion

  • 389 Posts
  • 154 Reply Likes
I don't know if you guys are aware of the catalog "Test Integrity" option that is in the catalog backup dialog and also in the start up dialog if you launch Lightroom using the "Prompt Me" option or you hold down Ctrl/Cmd when launching.




Obviously I don't know how comprehensive that option is, but there are sufficient reports in the various Lightroom forums of LR reporting that the "Catalog may be corrupt and needs to be repaired" to make me think that it's doing something.
Photo of DAVID KOTZ

DAVID KOTZ

  • 12 Posts
  • 1 Reply Like
Thanks Jim.  Yes, I recall the "Test integrity before backing up" option, but I no longer use LR's catalog backup feature.  (I back up my entire laptop every day and that covers the catalog and everything else, so LR backup was extraneous.)  That said, I could envision running LR backup once in a while, if only to get the catalog integrity checked.

I did not know about the startup option for integrity check - which, on the Mac, is obtained by holding the option key (not command key) on startup.  That's great!  It finished silently, for me, so I assume my catalog is ok.

Now we just need a "Validate all photos" (really, Validate all content) feature.
Photo of Jim Wilde

Jim Wilde, Champion

  • 389 Posts
  • 154 Reply Likes
Sorry, yes it is Opt on Mac, though it seems that either Alt or Ctrl can be used on Windows.
Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
Jim, that works for the catalog itself and internal consistency, but it does nothing to test the images.  The DNG check option does some of that, (a) if you use DNG, and (b) for image bits only not metadata stored in the DNG.  (a) is a killer for many people.  In addition, the LR support for just checking that links all work (i.e. that the link in the catalog points to an image, and that all images in your LR folders are in the catalog) exists, but is not terribly comprehensive, and is not part of the "integrity" check either.

To me a DAM should "own" the content, it should have redundant ways to identify it and know whether it is changing behind the scenes, whether from bit rot or user error.  At present Adobe does a mediocre job of protecting the catalog (witness the year or so of corrupt catalog backups), and pretty much nothing to protect the images themselves.  Well, unless you buy into DNG entirely.  Maybe that's their rationale for not addressing it.
Photo of Jim Wilde

Jim Wilde, Champion

  • 389 Posts
  • 154 Reply Likes
Linwood, I was simply addressing ONE of the two "deficiencies" from the OP.
Photo of dmeephd

dmeephd

  • 300 Posts
  • 79 Reply Likes
As David noted, if one stops using the built-in Lightroom backup feature (which is dodgy at best and dangerous at worst and eventually resorts to producing zip files once the catalog attains a certain size), then the ability to test catalog integrity is lost.

At a minimum, Adobe should provide a separate command for Test Integity as it does for Optimize Catalog.  That shouldn't cost them much more work than adding a undo modal box for quiting LR as they did in the last update.

However, what Adobe really needs to do is place Lightroom on a full-featured relational database platform such as MS SQL or Oracle and stop peeing with the puppies on SQLite.   So many of our performance issues would be instantly addressed, albeit the incremental cost of Lightroom would increase slightly as neither of the real DBs are free.
Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
I'd love to see them switch, but I suspect (I have no real knowledge of their code) that they have capitalized on SQLite features that may require rewrites of a lot of code.  Maybe.  But cost shouldn't be an issue -- MySQL and Postgresql are good free examples, and if you don't like Oracle's association with MySQL then use MariaDB which is pretty much a drop in replacement.  But... unlike SQLite which is easy to just embed in the program, these typically require separate installs and services running, which will substantially complicate the install/upgrade process, and also may interact with other software on a machine that might be using them.   Basically it has the potential to be a very messy changeover.  And neither are really windows native software (I realize there are versions for windows, but there's a difference in "runs there" and "well supported there"). 

But I think all this is relatively irrelevant, other than with respect to multi-users.  SQLite is a darn solid database, it just is not very full featured.  I doubt seriously that it is any inherent weakness in SQLite that causes issues with Lightroom.  And certainly none of this addresses image verification.
Photo of dmeephd

dmeephd

  • 300 Posts
  • 79 Reply Likes
Actually, it all revolves around the image.  How would you expect LR to 'verify' an image unless there are database tables with records containing fields with certain keys, primary and foreign, with which Lightroom would use to verify the image by ascertaining that the data entries in the tables are correct with respect to data contained within the image?

Neither Lightroom, or any other program for that matter, can verify an image (or other file type) without some baseline data from which it can make a comparison.  This is called checking referential integrity at its minimum, and table verification the higher level.

There's no magic wand a program can wield in order to verify any file from whole cloth.  There has to be some baseline data; (e.g., checksum, hash count, previously verified value, etc.) to make a comparison to derive the absolute truth: that the image file is correct.

And before the database program can make this verification, it has to have higher level referential integrity than SQLite possess.
Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
I would maintain that what I did in the program mentioned up above handles most of that. I create a separate table in the catalog and do checksums of all images that LR is aware of. I can then (a) check that the image still exists where expected, (b) that it has not changed (obviously mostly relevant for raw) by checking the checksum, (c) that LR still points to it (since my pointer is independent to LR's).  Now where this breaks down is that purposeful changes (e.g. edit JPG in photoshop, or write metadata into a JPG) will change the image checksum, but since I do 99% raw I did not care.  LR on the other hand could deal with those updating when appropriate, check the checksum on each open or en masse.  Also, this starts a bit too late -- the checksum really should be captured on the card/camera, checked after ingestion as well to ensure a valid copy-from-card/camera.  So what I did is a half-baked solution; only Adobe can really do a fully baked one.  I just wish they would.

I've also wondered if all the cloud level work with CC (non-classic) even maintains such integrity checks, or do we just trust that no bit rot ever occurs in the cloud?   Or to or from the cloud?  I personally think Adobe gives short shrift to image integrity -- at least in what they say externally.  Maybe it's all really solid under the covers, but a white paper on "how we protect your image" would sure be an interesting read in CC, and explicit checks in Classic would be very welcome.
Photo of DAVID KOTZ

DAVID KOTZ

  • 12 Posts
  • 1 Reply Like
Linwood, thanks for your detailed replies.  I agree it would be great to have end-to-end integrity checks (beginning in the camera, thence to the card, thence to the computer, etc.) but for now I'd be happy with a checksum computed and recorded when LR imports the image, and reconfirmed later when needed.  It sounds like you crafted such a solution with LRvalidate.  But it appears that plug-in is now out of date. 

I'm writing a little shell script to do this sort of validation outside LR.
Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
Actually it is not a plugin, it is a standalone program, and it still works.  But it is an incomplete solution, and I am not pushing it really.  in particular if you have a lot of TIFF's you edit (or JPG's) it gives too many false positives as it warns you if they change.  I also do not push it as it does change the catalog (though the changes are new tables not changes to existing ones), and a lot of people are uncomfortable with that.
Photo of dmeephd

dmeephd

  • 300 Posts
  • 79 Reply Likes
Linwood, when you say "I create a separate table in the catalog..." what do you mean, precisely?  I assume you're NOT talking about a database table with records and fields...

Otherwise, what you're doing is probably correct, and I too wish that LR would do this internally, but that will never happen as it is currently architected.  It is simply beyond the capability of SQLite.  One can't run a Yugo in the Gran Prix of Monaco.

Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
No, I meant exactly that, I create a table (I think two actually).  It's safer to create new structures while co-existing with LR's code than to add a new column to an existing table; the latter can cause errors depending on how LR inserts or modifies rows, but generally speaking LR will just never see my tables.  Bear in mind I did not create a plugin, it's a standalone program you run when LR is not in use.  One advantage of doing it that way is I can more directly access the database and files, and I can create multiple threads to do the checksum validation, not limited to the speed (i.e. slowness) and features of LUA.
Photo of David Roberts

David Roberts

  • 1 Post
  • 2 Reply Likes
Hi, I was having LR Classic sync issues (crash when sync started each time LR was run - OK if sync paused)

LRValidate found 3 corrupt jpeg files, which i fixed from original pre-import backups, and now sync seems to be working!  So many thanks for that, and I will continue to use it regularly.

I did have to make one small change to the software, to handle a filename with a single quote in it, which needs changing to 2 x single quote in sqllite sql.

what would you like me to do with this small fix?
Photo of Johan Elzenga

Johan Elzenga, Champion

  • 2320 Posts
  • 949 Reply Likes
Here's a plugin that works with Mac and Windows. Old too, but it still works fine. http://bayimages.net/blog/lightroom/validator/
Photo of DAVID KOTZ

DAVID KOTZ

  • 12 Posts
  • 1 Reply Like
Thanks, I'll check out that old plugin.  Ironically, I have been writing something very much like it, but only as a separate shell script - not a plugin. 
Photo of Linwood Ferguson

Linwood Ferguson

  • 29 Posts
  • 18 Reply Likes
@David Roberts, you can put it on github with the project https://github.com/Linwood-F/LRValidate or if very short post here. I don't see a way to PM on this site, and hesitate to post my email, but am very interested in getting the bug fixed.