Bug: DF uses 3-6% CPU even when idle, constantly trashing the CPU cache

WARNING: You currently have Javascript disabled!
This website will not function correctly without Javascript enabled.

Astara

14 discussion posts

Even when DF is idle, it's always using 3-6% of a CPU. This affects all running programs as its working set (~135MB) greatly exceeds the CPU's internal program cache (L3 cache is 12M on my cpu). This makes it a constant drain on performance for no apparent reason.

I currently have it set to change background once a day. So why must it use CPU all day when it is doing nothing?

Note -- I have application hooks disabled. And am only using the change-wallpaper feature once a day. It is set to use 1 cpu and no titlebar buttons, and not help move any windows. So why is it active at all? Seems it should sleep for a day, change the WP, then go back to sleep. Why not?

As it is now, I have to suspend DF to stop its usage... ;-(

Sep 12, 2016 • #1

NetMage

294 discussion posts

I'm not sure you understand how cache and working set interact.

I also don't think DF should be using CPU constantly. The one feature I don't have enabled is wallpaper change, and it uses 0% on my PC on Windows 10.

I am running Pro 8.0.

Sep 12, 2016 • #2

Astara

14 discussion posts

Working set is the amount of memory the program has used in the past 'n' seconds, where 'n' is ill defined, but isn't greater than the program runtime.

I'm able to temporarily shrink DF's working set, but in <.5 seconds it's over 16MB.
At about 4 seconds, it had gone up to over 30MBI just tried shrinking the working set, and it went down to about 23MB, but within 3 minutes, it was back up over 50MB. That would imply that the working set is over 50MB in a 5 minute period. The lowest I could measure was about a half a second -- for which I got a 16MB working set, but in 5 seconds, it was over 30MB -- and this was when it was "idle" (no image loading was going on).

If it reads or uses 30MB in 5 seconds, that's enough to flush the machine's L3 cache 2.5 times. Optimistically, that's wiping the cache every 2 seconds -- when it is supposed to be idle.

I can't believe that having image-loading "on", but only loading an image ~ once/day would require a 3-6% usage penalty.

Regarding your 0% figure, what util are you using to see that? Are you multiplying it by
# of Cores in your system?**1

**1 - Why: DF defaults to only using 1 core, but MS obfuscates actual cpu usage by dividing cpu-usage by the # of cores in your system. But that makes no sense for a 1-core program like DF. Ex: if DF used 1 cpu second in 10 seconds, with one core it is using 10% of that core. With 100 cores, it would be using .1% of the total, but that's a meaningless number for determining how busy a given program is. Second Question --- does your reporting program show fractions? I.e. if you were using 3-6% of 1 core, and have 8 cores, integer math would
truncate the result to 0, all the time (another hazard of dividing by #cores in a system).

Another means to 'underreport' is to multiply usage by your current-clock speed/max-clock speed. So on my desktop, minimum clock is 1600Mhz, and max is 3450Mhz. Multiplying by current/max speed would likely multiply by .5 most of the time. Rationale: well you are really only using half of your "potential" clock speed, so reported cpu usage should be scaled to reflect that. Again -- not very useful in observing a single program's behavior.

MS reports program usage compared to maximum potential, available cpu on the system, which isn't a very useful number to most users, and distorts cpu usage figures to be artificially lower by a factor of 10x or more compared to older systems (w/1 core, fixed clock).

Anyway -- a bit off topic -- as it really shouldn't be using any while it is idle -- and that's the bug/problem.

Did I clarify things at all, or make them more confusing? ;^)

Sep 12, 2016 • #3

Ben43

7 discussion posts

Though I have no idea what most of your post means or the veracity of the details underlying it, I too have noticed a dramatic increase in CPU usage of DF since the last upgrade. So much so that I've begun shutting it down when I'm in a battery-only situation.

Sep 13, 2016 • #4

Astara

14 discussion posts

Quote:

Which part was confusing... I can simplify it -- just was responding to the earlier poster who seemed to have more technical bg. But I can simplify and elaborate to make my techno-babel more understandable.

My basic point was to ask what tool he was using to look at cpu usage. A good one is "process hacker" (google it) on sourceforge.net. It's pretty accurate, though it does suffer from showing the percentage core usage divided by the # of cores. I assume you know how modern cpu's have more than one core? Before there were multiple cores/cpu, there was only 1 cpu(1 core) per machine. With that, cpu usage was straightforward, It was expressed as cpu-usage (in seconds) * 100 divided by real seconds and expressed as %CPU usage. Simple. With multiple cores/machine like we have today, the cpu usage becomes more complex.

Parallel, large-computer manufacturers like Cray & SGI (Cray was later bought by SGI, who went out of business about 14 years ago) still expressed CPU usage using the above formula (CPU time in seconds *100)/(Real seconds). So a 4-core machine would give you up to 400% usage as compared with a 1-core machine, and 8-core would give you 800% usage, etc. Such measures are useful in knowing program behavior -- a program that isn't programmed to use more than one CPU (like most programs a few-several years ago), wouldn't ever use more than 100% usage -- whether it was on a 1 cpu system or a 12-cpu(core) system.

The MS way of calculating cpu usage Always produces a total cpu usage of <100% on any machine -- 1 core or 100 core (Intel had those about 5 years ago, but not x86 compat). W/MS way, your program that used 100% of 1 core, would only show 1% usage on a 100-cpu machine. So they divide a program's cpu-core usage by the total # of cores and say it is using X% of the cpu of the entire machine (even though most programs only could use 1 core). So as is typical w/MS, it is misleading -- like their OS-overhead (various Service processes) daemons often show 2-10% on a 6-core machine, what they really mean is that they use 12-60% of 1 core. It's a way of hiding a few things -- like the fact that a 1-thread (one that only uses 1 core) can never use more than, say "25%" on a 4-core machine. So on a MS-cpu-display, 25% = 100% of 1 core (but only on a 4-core machine). That's the problem -- you never really know (at a glance) if your
program is using 100% or 1% unless you also know the number of cores.

On linux machines (which have choice of display), you can display usage as a percentage of 1 core, or as a percentage of all-cores added together). I prefer cpu usgage to be displayed relative to 1 core, and showing >100% for multi-core usage. The MS-displays hide the number
of cores in a system (because that confuses users or managers, they claim). On a 100/core type-display, processes that use 100% of 6 cpu's, will show 600% CPU usage (because CPU has always stood for Central Processing Unit, not "units" -- as MS uses it).

Anyway, the clock speed issue is another item to throw in for confusion, but I won't bother with it unless you are interested, since you didn't really indicate you were even interested in any explanation and I've already, perhaps gone way past your interest point. Sorry! :-)

More to the point -- what is DF "doing" while it is supposed to be idle for it to use all that extra CPU?

Sep 13, 2016 • #5

Ben43

7 discussion posts

I have no need to understand it ... my head is too full of other stuff as it is and your explanation would be lost on a Luddite like me. I was just adding my "me too" to the list of folks who are seeing DF run the hamsters inside their computers too hard.

-Ben

Sep 13, 2016 • #6

Astara

14 discussion posts

Quote:

I have no need to understand it ... my head is too full of other stuff as it is and your explanation would be lost on a Luddite like me.
-Ben

Understood -- my housemate is similar and thinks that anything higher tech than an axe or club is too much! ;-)

He's often likened to a troll or ogre depending on his mood (not that there's alot of difference between the two).

Sep 13, 2016 • #7

NetMage

294 discussion posts

An important thing to understand about Working Set (or better, Working Set Private) is that it is only trimmed (i.e. reflects recent process usage) if you are already low on physical memory.

If you have enough physical memory, Windows may never eject any pages from your Work Set, and thus you may think your process is running through 90MB of physical memory constantly, when it is only using 1 or 2MB continuously.

I am using Process Explorer and I see DF running about 94MB WS Private and 0.4 to 0.8 CPU%. PE appears to follow Task Manager and cap total CPU at 100%, so given my 4 core HT processor, DF is using up to 0.1% of a single (HT) core, which doesn't seem unreasonable considering I have both a secondary taskbar and window buttons enabled, as well as global triggers.

Sep 27, 2016 • #8

Astara

14 discussion posts

Quote:

I'm not talking about the windows cache -- I'm talking about the L2 or L3 cache on your cpu-chip.
A 6-core Xeon has 12MB of L3 processor cache while a 4 or 6 core "Core-i7" has 8MB of L3 cache or less. The exact number vary depending on the cpu-generation and type, but as an example,
Xeon's 5660 and 5680 both have 6 cores (they differ by clock speed), They have L1 data and instruction caches local to each core of < 16KB, (8K?), L2 caches local to each core of 256-512K (forget which), and an L3-cache of 12MB that' shared among all cores, then it goes to main memory. When the CPU needs data, the times and delays, ballpark, are:
L1 cache - about 4 cycles of your cpu (1GHz=1 nanosecond/tick, so 2-4Ghz x 4=2-1ns.)
L2 cache takes about 10ns
L3 takes 15-35ns
main memory has about 85-100+ns latency depending on the memory speed.

For a modern chip (more L3 core, but slower access time but faster main memory as well, yield different number, but gives you an idea):
http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/11

So if you are running a single program that is executing a tight loop in < 512K of code, it can take 10 times as long if it is interrupted and has to refetch all of its data from main memory. Just executing out of the in-cpu (on the cpu chip) L3 cache is 3x faster than executing from main memory.

These caches are managed automatically by the chip and have almost no options for OS control at this time. The only way your program stays in the L3 cache is to make sure it uses < 12M during its runtime. Every interrupt by another program eats away at the 12M, which is why for timing and benchmarks they time on dedicated machines with nothing else running.

You can force a memory trim to see how fast its working set grows in a given time). I don't know if Process Explorer has that option or not (I used to use Process Explorer, but one of the users in the PE forum developed Process Hacker (not a hacking tool despite the name!) which is a superset of ProcessExplorer's functionality -- it has the trim option). But if you force a trim you can see how much total, new memory a process needs in a certain time. For DF, that was 3MB of new memory/s -- meaning a 12MB L3 cache won't hold it -- ever. Period. The only way to have DF not affect all your programs is for it to *sleep* when it isn't doing anything. Instead, it is busy waiting (see https://en.wikipedia.org/wiki/Busy_waiting). Busy waiting is generally consider harmful to OS & CPU performance and energy usage.

Quote:

I am using Process Explorer and I see DF running about 94MB WS Private and 0.4 to 0.8 CPU%. PE appears to follow Task Manager and cap total CPU at 100%, so given my 4 core HT processor, DF is using up to 0.1% of a single (HT) core, which doesn't seem unreasonable considering I have both a secondary taskbar and window buttons enabled, as well as global triggers.

---
Sorry, I must have been confusing, since that's exactly backwards. If you are using .4-.8 of 4 cores (4 cores provide 4 seconds of cpu/real second), then you are using .4-.8% of 4 CPU-seconds/real-second. If you had 1/4th as many cores (1 core), it would take 4 times as many CPU seconds, right? So you would multiple by the number of cores. So the single core usage would be 1.6-3.2% of a single-core, or 0.016-0.032 CPU-seconds/real-second (real = wall clock seconds).

In the memory latency article I gave the link for above, the new Haswell CPU has up to 18 cores.
On that machine, the CPU usage, as a percentage of 18 cores would be 18 times less of what it would, as a percentage be on 1 core. So your 1.6-3.2% on 1 core would be 0.09-0.18% on the current higher end cpus.

That's why I promote the idea of expressing CPU% in monitoring programs as a % of 1-Core, since if we don't, you won't have an idea of what a single-threaded, 1-core-using program is doing when cores get to be insanely high. Using 1-core number, you can compare usage of a program on different machines with some meaning. A program that can use 4 cores @ 100% each would show a 400%-single-core usage -- with a 6-core machine being able to provide up to 600% times the CPU of a 1 core machine.

Did my explanation make anything more clear -- I know it was dipping into technobabble, but tried to make the concepts more concrete.

Sep 27, 2016 • #9

Was this helpful?

(1)

(-)

<< Discussions Reply

Title

Confirm

Confirm

Confirm

Confirm

Confirm

Confirm

Bug: DF uses 3-6% CPU even when idle, constantly trashing the CPU cache