Corrupt Ancestors tag in XMP causing giant file sizes in Photoshop

  • 3
  • Problem
  • Updated 6 months ago
Users have reported enormously bloated files where the issue appears to be spurious entries in the photoshop:DocumentAncestors XMP tag. I was provided some sample files and found tens of megabytes of data disappeared from each file when I removed that tag.

The file that I was given was created in Photoshop CC 2017 for Windows.

While not absolutely certain, evidence seems to suggest that the corruption is being produced in Photoshop, under conditions that I haven't been able to reproduce. As opposed, for example to it traveling around the internet on infected files (which may well be happening as well)

See this thread in the Adobe user forums  https://forums.adobe.com/message/10371400  (my comment is at the bottom as I write this.) Or this thread (look near the bottom) https://forums.adobe.com/thread/2382524?start=40&tstart=0   and this blog post   http://prepression.blogspot.com.au/2017/06/metadata-bloat-photoshopdocumentancestors.html

 I have the files and would be happy to pass them along to you, along with any help I may be able to provide if you'd like.

(Note that in both those forum threads, users actually discovered the problem when the bad files were incorporated into InDesign or PDF files, but the issue began in image files (the ones I looked at were Tiffs.)

-Carl
Photo of Carl Seibert

Carl Seibert

  • 4 Posts
  • 1 Reply Like

Posted 7 months ago

  • 3
Photo of Max Johnson

Max Johnson, Champion

  • 493 Posts
  • 239 Reply Likes
Why prioritize this? This *directly* affects @Adobe's bottom line. It is a drain on @Adobe resources, not just a customer issue. Read on.

This is a huge problem for shared Creative Cloud  Library assets. Any library you subscribe to is saved to a hard-coded directory on your system drive. Any object you add to your library performs a save-as to that directory then syncs to the cloud. Now you can have an object that is literally just a vector box with a gradient applied that is 25mb. Multiply that by 50 for a shared UI library, multiplied by a few different libraries and it adds up fast. 

@Adobe, how much does it cost to pump an extra 10-30mb of junk data to and from your cloud every time any user saves a file?

For users this is compounded by the fact that newer versions of photoshop routinely eat 40+ gigabytes of scratch disk. A contractor working from a laptop with 500gb drive is going to start feeling the pinch.

It also floods our source control with useless bytes for every version archived.
Photo of Max Johnson

Max Johnson, Champion

  • 493 Posts
  • 239 Reply Likes
Excerpt from this thread for more information:

"The Photoshop document ancestors tag is written every time a file is "saved as", or has an element from a different file placed in it. Ancestors data is inherited from, well, ancestors. But still, it would be pretty tough for a human being to rack up tens of MB worth of that kind of log data.

 There is an option in Photoshop to turn on writing yet more log data - logging every save of the file, or optionally logging every single action that is done in Photoshop. That can lead to giant blobs of metadata. But by "giant", I mean tens of KB, not megabytes.

 I would think that it would take a malfunctioning machine to generate the kind of bloat we're talking about here.

 UPDATE: The OP (to whom I am married, for the sake of full disclosure) provided me with the files in question and I examined them.

 Two of the images placed in the final PDF had excessive Document Ancestor tags. One was the Tiff Bonnie mentioned in the original post, the other was a JPEG. The Tiff should have been about 25 MB, but was 56.5 MB. The JEPG was 19.8 MB when it should have been 1.7 MB. The stock photo image that was the base image for the bloated Tiff was free of corruption. It had very tidy metadata.

 That suggests that Bonnie's copy of Photoshop, or another in her company may be the offender, making the corrupted files. That's not for certain, though, because the corruption may have been inherited from an ancestor to one of the bad files.

 In the case of the Tiff, ExifTool displayed some 37 KB of Document Ancestor entries - over 1,000 entries. But when that tag was removed from the file's metadata, the files size dropped by 30 MB! Very curious. I suspected that the ancestors data might have been a symptom, rather than the problem, but given that blanking that single tag effected the reduction in file size, that seems doubtful.

 I tried to find a clever GUI-based fix for these files without much success. The fact that the subject file was a Tiff and further that it was in CMYK eliminated most of the software I had at hand from consideration.

 Opening and re-saving the file in various formats in Photoshop didn't work. (And ancestor data survives copy and pasting onto a blank document, that's the idea of it.) Nor did opening the file and exporting it from Lightroom.

 I was able to use Photo Mechanic to strip all XMP metadata from the file. Oddly, there was no immediate impact on file size, but if I then opened the file in Photoshop and did a save-as, the file shrank to a normal size. IIM metadata was preserved, but of course, not XMP.

 If I simply did "Save photo as" in Photo Mechanic, the corruption was removed, but the file was saved in RGB, which wasn't so good. Ditto for the JPEG. A simple "Save photo as" removed the bloat, but the resulting file was RGB. All useful metadata was preserved in this case.

 XnView, ON1 RAW, and Apple Preview all failed to be of assistance.

 The only program I had at hand that was effective was ExifTool. It's free and it doesn't take up much disk space, but it's command line, so it may not appeal to everybody. The advantage of using ExifTool to fix this issue is that you can remove just the offending tag and leave all the rest of the file's metadata intact. The ExifTool command you'll need is simply

 exiftool -XMP-photoshop:DocumentAncestors=  yourFile.tif

 (Note that there are two spaces after the equals sign - one to tell ExifTool to blank the tag and one before the next argument)

 I didn't try Stephen Marsh's Photoshop script."

Photo of Carl Seibert

Carl Seibert

  • 4 Posts
  • 1 Reply Like
Max - Have you seen this happening with library assets? Can you isolate the instance of Photoshop that is doing the deed?

I'm trying to chase my examples back through their lifecycle to find the "patient zero" machine, but the path is proving to be long, running from company to company. 

-Carl
Photo of Max Johnson

Max Johnson, Champion

  • 443 Posts
  • 205 Reply Likes
I just tested this in "Adobe Photoshop Version: 19.1.4 20180507.r.32"

Test 1
  1. Opened long legacy file from CC 2014 version that's been resaved a lot up to current.
  2. Drag-n-drop single layer with some pixel data into cloud library
  3. New object is  489x38px and 14mb
Test 2
  1. Made new file
  2. made new layer and painted a splotch
  3. Drag-n-drop single layer with some pixel data into cloud library
  4. New object is  81kb
Test 3
  1. Drag-n-drop layer from new file to old file
  2. Drag-n-drop copied layer into cloud library
  3. New object is  14mb
Test 4
  1. Scrubbed out ancestor data from old file with a script
  2. Drag-n-drop copied layer into cloud library
  3. New object is  92kb
Photo of Max Johnson

Max Johnson, Champion

  • 443 Posts
  • 205 Reply Likes
The cloud items that are as far back as I can reliably go that are affected say the application was CC 2017 (mac) and creation/modification dates are in the Nov-Dec 2016 date range.
Photo of Carl Seibert

Carl Seibert

  • 4 Posts
  • 1 Reply Like
Interesting. The sample that I'm looking at was created in July of 2016. It was saved by a few different versions, through that same time period, including by CC 2017 (Mac)

Are your bad files all TIFFs?

[XMP-xmpMM]     History Software Agent          : Adobe Photoshop CS5 Macintosh, Adobe Photoshop CS5 Macintosh, Adobe Photoshop CC 2015.5 (Macintosh), Adobe Photoshop CC 2015.5 (Macintosh), Adobe Photoshop CC 2017 (Macintosh), Adobe Photoshop CC 2017 (Macintosh), Adobe Photoshop CC 2017 (Macintosh), Adobe Photoshop CC 2017 (Macintosh)
[XMP-xmpMM]     History When                    : 2016:07:27 12:44:58+09:00, 2016:07:27 12:44:58+09:00, 2016:10:28 12:42:56-05:00, 2016:10:28 12:42:57-05:00, 2017:11:07 11:48:41-06:00, 2017:11:07 11:48:42-06:00, 2017:11:07 11:49:22-06:00, 2017:11:07 11:49:22-06:00
[XMP-xmpMM]     History Parameters              : from image/jpeg to application/vnd.adobe.photoshop, converted from image/jpeg to application/vnd.adobe.photoshop, from application/vnd.adobe.photoshop to image/tiff, converted from application/vnd.adobe.photoshop to image/tiff, from image/tiff to application/vnd.adobe.photoshop, converted from image/tiff to application/vnd.adobe.photoshop, from application/vnd.adobe.photoshop to image/tiff, converted from application/vnd.adobe.photoshop to image/tiff