Lightroom 4.2 RC: Very poor performance on high end system - partial solution discovered.

  • 5
  • Problem
  • Updated 5 years ago
  • (Edited)
Lightroom 4 has been basically unusable for me, I gather I am one of the small number of people with powerful systems who has an unresolved performance issue.

I have installed Lightroom 4.2 RC and it still exhibits pathologically slow performance on my high end system. After some experimentation I have discovered that this can be ameliorated, at least on my system, by disabling hyper-threading on my processor (a core i7 950).(Configuring Windows to only use a single core also works to fix the issue although it does tend to slow everything else down, so not really practical).

After making this change LR4 actually becomes useable - I can touch a noise reduction slider without fear now.
Photo of jafpix

jafpix

  • 9 Posts
  • 1 Reply Like

Posted 6 years ago

  • 5
Photo of Aleksander Eriksen

Aleksander Eriksen

  • 16 Posts
  • 0 Reply Likes
Must be something with the framework.. I have a very high-end system and LR4 isn't usable at all... Disabling hyper-threading might work, but as you said, its just not practically since it slow rest of the system down. I'm working with video also, so "12" cores helps a lot more than 6 cores (i7 3960x extreme). Its either Adobes bad programming or the framework they build LR in.
Photo of Rob Cole

Rob Cole

  • 4831 Posts
  • 371 Reply Likes
It may not slow the rest of the system down if you run Lightroom at below-normal priority - that's what I do.
Photo of jafpix

jafpix

  • 9 Posts
  • 1 Reply Like
I believe you misread me. I claimed that configuring Windows to only run on a single core is not practical.
Disabling hyper-threading is practical and has solved the issues that I was experiencing.
The performance gain from hyper-threading is on the order of 30% but this would only be noticeable in an ideal situation where there were sufficient CPU bound threads to fully utilize all the logical cores of the system. More often than not in the real world threads are constrained by memory, disk and other hardware.
Photo of Rob Cole

Rob Cole

  • 4831 Posts
  • 371 Reply Likes
I wonder if this "fix" applies to Mac as well.
Photo of jafpix

jafpix

  • 9 Posts
  • 1 Reply Like
I guess we would need a Mac user experiencing the problem to test it out. Perhaps work-around is a better term than fix though ;-)
Photo of Nikos Vlasiadis

Nikos Vlasiadis

  • 4 Posts
  • 0 Reply Likes
I will be very happy to help if I somebody will explain me how to test it on a Mac platform
Photo of jdv

jdv, Champion

  • 728 Posts
  • 55 Reply Likes
Hyper-threading is an ancient thread management technique, and should be deprecated. I would recommend never buying hardware that depends on it. It is unfortunate that the affordable "high-end" systems still depend on it.

Hyper-threading is sort of like taking a muscle car engine and selectively choking it of fuel only when you need that torque the most.

People have been disabling it since it was released nearly two decades ago because it is much more likely to cause pathological contention cases than any other threading technique.

That all being said, assuming you have done all the other obvious stuff related to the ACR cache and the preview sizes, this might be the best solution for some. Lightroom is not actually terribly CPU bound (most field tests indicate it is much more memory and I/O bound) so even a modest number of cores running at their unmaligned maximum (without all the hyperthreading nonsense) is probably more than sufficient.
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
Sir, you are an indiot. And if 20 years ago was hyper-threading in the consumer space, what CPU had it? '92 We didn't even have Windows NT yet, but the beta program might have been around close to the time frame.

There are many "sub-componants" to a 'core', many threads depending on application can use different instruction streams or the same on each thread.

Different routines being executed on each virtual CPU-> mapping to a single core, can benifit with faster performance. As each thead are realy in fact using different sections of the core. Application that tend to improve w/o HT are multiple threads all running the same instructions, say math routines. The two exposed CPU's (HT) per Core, then stall as each thread shares the floating point and integer units.

Since SO many threads are executing in the system from applications to the OS kernel it is best to leave on. You easly gain 50% performance for free on average.

Window 7 and Windows 8 are more Hyper-threading aware in their schedulers to keep threads from stalling than any previous OS. Also parking (low power state) and running as many threads on the least few cores, saves power, yes more important in phones and laptops. But still HT is not going anyway any time soon.

There are also other OS CPU scheduling things to consider. Windows 7 would park (low power) 1/2 the Cores, so until there was real need to fire them up because of applicaiton/OS need. you were running as if HT was off. 6 Cpu's active. This appears to have changed for Windows 8, where power savings appeas to power down entire cores and the active cores running in HT (Virtual CPU's per core) running.

There are other trade offs. CACHE size is shared between HT virtual CPU's. Some applications can benifit being on same core because of cache locality. Other applications can take a perf hit because of data-access collsions.

Each application writer NEEDS to understand these trade off and assign their threads to CPU's (Affinity) if they don't want to OS to do it.

General users shouldn't comment on OS design or CPU Architecture.

Ron
Photo of jafpix

jafpix

  • 9 Posts
  • 1 Reply Like
From what I can gather reading various threads on the internet there have been lots of complaints about the speed of LR4 and these can be divided into two groups.
The first group consists of those who are, largely, expecting more performance than their hardware is capable of, given the extra complexity that LR4 brings. For this group the standard "optimizing your system for LR" steps, or upgrading hardware, are all that can practically be done.
The second group consists of people with systems more than capable of running LR who are experiencing outlandishly poor performance. I was one of those people and I can assure you that it was orders of magnitude worse than what might be expected. Although I don't fully agree with your views on hyper-threading I certainly agree that LR is not generally CPU bound. I suspect that the LR code has some undiscovered thread synchronisation issues which are causing the particular problems that I, and perhaps others, in this second group were/are experiencing. The nature of these sorts of thread synchronisation issues makes them very hard to track down. Simply running a debug rather than a release version, or otherwise instrumenting the code, can alter the timing of thread execution enough that the issues never arise under test conditions.

I am not claiming that disabling hyper-threading is a solution for everybody, I am merely pointing out that it has worked for me. If you are experiencing a similar problem then you are no doubt as frustrated and disappointed as I was. If that is the case then you may be interested in trying this solution for yourself. It does not really have any significant negative effect, if it works as a solution it should be patently obvious that it has done so, and if it doesn't work you can easily reverse the changes.
Photo of Christian Riedel

Christian Riedel

  • 10 Posts
  • 0 Reply Likes
I have the same problem. i7 2720qm. 8GB RAM. SSD. And Lightroom is super slow. Just had to process some 6000 images. A horribly time-consuming experience given the current performance of LR 4.
Photo of Rob Cole

Rob Cole

  • 4831 Posts
  • 371 Reply Likes
Lr4.2-final any better?
Photo of Robert Peters

Robert Peters

  • 39 Posts
  • 15 Reply Likes
I am very interested.
Photo of Julie Kmoch

Julie Kmoch, Sr. Development Manager

  • 97 Posts
  • 32 Reply Likes
We are currently studying some changes to the way we account for hyperthreaded processors. However, we weren't confident enough in the fix yet to include it in 4.2. For those of you having problems on high performance machines, I'd appreciate a volunteer or two to test a private drop once we have it ready. Please reply here if you're interested.
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
You can always ask me, system specs: 6C/12T 4.7Ghz I-3930k, Catalog, OS on SATA III SSD, CR2's+XMP on 2TB HDD SATA III.

If its not fast on my system, it won't be on anyone elses. And I too wasn't pleased with 4.X performance on Windows 7 and now Windows 8. But I will not turn of Hyper-Theading for a single incorrectly written application.
Photo of nick kessler

nick kessler

  • 2 Posts
  • 0 Reply Likes
I would volunteer as well. I am running a macbook pro retina 2.7ghz 16gig ram 768gb sad. Nick@nickkessler.com
Photo of jw stephenson

jw stephenson

  • 34 Posts
  • 1 Reply Like
Julie,

Here is the specs of my machine. If you think I would be a good candidate I would be happy to help. I have disabled hyperthreading since LR3.

2x Intel Xeon E5520 Nehalem 2.26 GHz 4 x 256KB L2 Cache 8MB L3 Cache LGA 1366 80W Quad-Core CPU
SUPERMICRO MBD-X8DAL-i-O Dual LGA 1366 Intel 5500 ATX Dual Quad-Core Server Motherboard
12GB(6 x 2GB) 240-Pin DDR3 SDRAM ECC Unbuffered 1333(PC3 10666) Triple Channel Memory
4x (Raid10) 1TB 7200 RPM 16MB Cache SATA 3.0Gb/s 3.5" Internal Hard Drives (Hitachi HDT721010SLA36) for Data
2x (Raid1) VelociRaptor 300GB 10000 RPM 16MB Cache SATA 3.0Gb/s Internal Hard Drives for OS
2x (Raid0) VelociRaptor 300GB 10000 RPM 16MB Cache SATA 3.0Gb/s Internal Hard Drives for Cache/Scratch Files
2x Adaptec 2240900-R PCI Express 4-Lane 2.5Gb/s SATA 1430SA
Radeon HD 4770 512MB 128-bit GDDR5 PCI Express 2.0 x16 HDCP Ready Crossfire Supported Video Card
COOLMAX RM-1000B 1000W EPS12V Active PFC Power Supply
Windows 7 Ultimate

Jeff (jw at tami.com)
Photo of Robert Peters

Robert Peters

  • 39 Posts
  • 15 Reply Likes
Julie:

My previous comment was placed in the wrong location. I would be pleased to help.
Photo of Bo Bickley

Bo Bickley

  • 2 Posts
  • 0 Reply Likes
Julie,

I would be happy to help. No hyper-threading = no joy

I have a rMBP 2.7Ghz, 16G Ram, 512 SSD + TB Display

Thanks
Photo of Nikos Vlasiadis

Nikos Vlasiadis

  • 4 Posts
  • 0 Reply Likes
Julie I am also happy to help
MacBook Pro
I7 2.8
16gb memory
512gb ssd
500 gb secondary disc 7200rpm

MacBook is 2012 model with super boost not the retina one

Nikos
Photo of Bo Bickley

Bo Bickley

  • 2 Posts
  • 0 Reply Likes
Julie,

Quick test on RC3 shows no difference in use of all cores on rMBP. During develop system is still showing 83% or so idle with four cores active and the other 4 with traces of activity. Looks like hyper-threading didn't make into this one either?

The forum is going to go nuts over this! Get your flack jackets out. Just sayin'

If there is a special build available it would be very nice to test it soon.

Thanks!
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
I am now as of last night running 4.3 RC
Photo of John Margaretten

John Margaretten

  • 5 Posts
  • 0 Reply Likes
Longstanding performance issues with LR4 (4.1, 4.2RC, 4.2). Haven't tried 4.3RC yet. Switching between images in preview is slow and switching between images in develop can take 10-30 seconds. Have tried all fixes/workarounds to date, including recently did clean re-install of OS and program - no material improvement.

OSX 10.8
2008 MacPro
2 x 2.8 Xeon 4 Core
10GB RAM
2560x1600 Display
Boot drive: Internal 7200
Internal Raid5: 3 x 1TB 7200
External Storage (eSata): Drobo 5 (5 x 2TB 7200)

LR on Boot.
Catalog and Cache on Internal RAID
Library/RAW on Drobo
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
Out of these slow systems, especially those with multi-core and HT; how many are on NVidia or AMD GPU's?

My OS and cat are on 6Gb/s 500/MBs SSD and my current set of camera pics are on another SSD/3Gb and 6C/12T, so its not my CPU or Disk IO.

I think its time for some heavy ETL tracing in Windows to see where (if any) the cpu or IO delay is coming from.
Photo of shura.shum

shura.shum

  • 12 Posts
  • 0 Reply Likes
is there any progress on this one?
4.3 is not really slow here on MBpro retina 2.6 Ghz with 16 GB RAM (osx 10.8.2)
however, judging from the activity monitor, only 4 real cores are busy when, e.g. rendering previews, with the other 4 HT virtual cores having only minor bumps on activity graphs, which is supposed to mean, that LR can be MUCH FASTER if it took real advantage of all processor cores.
Photo of Cad Yellow

Cad Yellow

  • 2 Posts
  • 0 Reply Likes
Is there any progress? Although I've purchased the LR 4 upgrade, I've been waiting to upgrade to LR 4 (from LR 2) for months because of reading about all the problems a subset of users have had. How can one find out the current status? It seems there are threads about these issues spread around various forums, but most are not current.
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
You didn't specify your current OS, CPU, RAM, HDD or SDD to have any insight. For what I've heard about old LR. Your hesitation leaves me suspect that you really want an answer.

My answer is, Yes, upgrade.
Photo of Robert Frost

Robert Frost

  • 392 Posts
  • 51 Reply Likes
< that LR can be MUCH FASTER if it took real advantage of all processor cores.

I believe LR only uses the real cores, not the virtual ones. If it did use the virtual ones, it might speed up a little, or it might slow down, because the virtual cores can only fill in gaps in the real core processing. Everything has to go through the real cores in the end, just the first part of the core processing is duplicated and then a virtual core can insert stuff into gaps in the real core processing. This is not going to be the solution you want, and in some cases Hyperthreading is slower. It also uses more power, so I turn Hyperthreading off. It was a good marketing wheeze of Intel's, but even Intel only claimed an increase of up to 15% with careful programming. Note the up to.

Bob Frost
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
Bob, that's not correct on virtual CPU's or real. The "CPU" the OS and applications see are not CPU's, they are registers that are setup to look like the CPU's of yester-year.

The back end execution units, integer and floating point units, pre-fetcher, branch predictor, etc. are all 'core' pieces of the processor. 1 or 2 (HT enabled) are the copies of front end OS visible CPU registers of yester-year. 1 or 2 of them use the back end. When two HT CPU's are exposed to the OS both run threads executing code.

That code is broken up in to tiny sub-assembly instructions that are fed to individual execution units. Reason why or not HT can help is the instructions enter the CPU (each thread).

If the break down of the instructions to u-ops (micro-ops) use the same execution unit, one has to wait. If different code sequences use the CPU (C0 and C1), then each micro-up can go to different execution units per clock tick or X # of them.

15% is a bit low for HT cpu's, but there is power eff. in there. There are many stalls and while one execution unit is buy, others are idle but coming power. So you might as well feed them with some instructions (or u-ops).

Writing multiple threaded applications binding a thread to a single CPU per core (not matter which CPU as I explained above) can be beneficial or it can harm. It depends on the application and the data being processed.

Major draw back with HT is the caches are mostly shared so now each HT CPU has 1/2 the L1 or L2 cache size. This can impact performance, by that reduction pushing often accessed data further away from the CPU adding latency.

You can write multi-threaded applications that run very close to 2x per core, where each thread helps the other one, but at the same not does not hinder performance.

You can write multi-threaded applications that are so bad all you do is stall both CPU's in the core.

In reality it is most often better to enable HT as the instruction streams will always move around from CPU to CPU, from Core to Core as needed and directed by the OS. The master thread/CPU scheduler. Until a thread is set to the running state on any CPU, that application isn't going anywhere.

Now in Windows 7 and 8, both highly efficient schedulers for HT cores, park one of the two CPU's until it would benefit. So most often YOU ARE running as if HT is off, until you NEED the other virtual CPU in that core, other cores could still leave one of them parked.

R
Photo of Ron Hu

Ron Hu

  • 45 Posts
  • 2 Reply Likes
Now for those with really slow systems, try these changes.
Win7/Win8
Control panel-> Power Options.
Show additional plans (if needed) and select "high performance" this WILL by default run you CPU at 100% all the time. You can edit the plan (most of the time) and reduce the min, cpu % to say 10%, to get back power saving when CPU isn't loaded.

By default the above is balanced. The CPU down clocks and some other CPU specialist are not as aggressive to enable unless needed. To save power.

Control Panel-> System->(Left side) "Advanced system settings"
In the performance box click "settings" then "Advanced" tab.
Most by default will see "Programs", you might want to try background.

What this does is change CPU scheduling. Graphics (UI) will get 3x as much CPU time before switching to another ready thread. This may or may not help LR, depending on how they generate the previews. If they are being run from a background EXE w/o an UI, then GDI will not bump its quantum (scheduling time).

This change requires a reboot.
Photo of Robert Frost

Robert Frost

  • 392 Posts
  • 51 Reply Likes
< Bob, that's not correct on virtual CPU's or real. The "CPU" the OS and applications see are not CPU's, they are registers that are setup to look like the CPU's of yester-year.

......

Well, that's the story I read some years ago. Perhaps it was a simplified version of the truth for non-programmers like me! Here's an example:

"HyperThreading is a feature introduced by Intel, and is exclusive to Intel processors. It splits a real CPU (a core) into 2. One is the real core, called the physical core. The other is just a secondary core, called the logical core. This logical core can't do much, but it does provide a little increased parallism. It is far from being a real core. In fact, it offers 10-20% (est., likely less) the performance of a real physical core. That's right, barely any computing power. Its purpose was simply to increase parallelism in a world dominated by I/O bound (non-CPU intensive) processes (actually threads, but we won't split hairs here). When a CPU intensive (CPU bound) thread is switched to one of these cores, its performance will substantially degrade. Therefore, in some situations, it is appropriate to use Process Lasso's HyperThreaded Core Avoidance, or disable HyperThreading all-together. Although the Windows Scheduler has become increasingly aware of HyperThreading, this is still a factor since the Scheduler is no AI, and it is especially important in XP and below where the Scheduler is even less aware of HyperThreading."

from
http://bitsum.com/pl_when_hyperthread...

This same article also goes on to say that AMDs version of hyperthreading gives more processing to each of the duplicated parts of the cores, but they still share some of the processing units.

Bob Frost