There are rumors entry level workstations will use an Alder Lake based Xeon chip. Not for power workstations though.

I'll fiddle with CVData using big.LITTLE. I have some old code that avoids loops, but executes fewer instructions, while having more instructions. This was speedier in older architectures. Occurs to me that with better branch prediction, this may not be an efficient use of the L1 cache. I'm focused more on cache usage and wish we could see cache hit ratios. CVData keeps a lot of data, which keeps a lot of memory busy when running 20 threads.