That should do it.....

Postby **Cmdr Kharma** » Sun Apr 16, 2017 8:55 pm

Ahhh fuck it.......

I'm gonna build it......

I cannot get the spec I want from any builder.....

Mind you cannot get some of the stuff I want from anywhere at the moment........

Waiting game.......

Once again.....

Postby **pargyrak** » Sun Apr 16, 2017 9:06 pm

I hope the wife approval form does not have an expiry date...

Postby **Cmdr Kharma** » Sun Apr 16, 2017 9:18 pm

I'll ask her......

Should I bother to dig her up.......

Muwhahahahahahahahahha..........

*Edit*

To the NSA and GCHQ.......

I'm Joking..........

She is still alive and banging my earole daily......

Sarin is a bit hard to get hold of on Ebay or Amazon......

Postby **pargyrak** » Sun Apr 16, 2017 10:18 pm

lol G'N'R song

I used to love her
but I had to kill her

had to put her... .six feet under
and I still hear her complain....

Postby **Relix Typhon** » Sun Apr 16, 2017 10:25 pm

pargyrak wrote:lol G'N'R .

And backwards, thats RNG. Oh the connections...
I've been looking at this galnet puzzle too long :shock:

Postby **pargyrak** » Sun Apr 16, 2017 10:33 pm

Most def

YouTube · Postby **thebs** » Mon Apr 17, 2017 4:55 am

Loriath wrote:Might wait a while longer for the dust to settle and recalls to happen. It's not a Windows scheduler problem, even AMD admits that. There may be something more at play here.
https://www.pcper.com/reviews/Processor ... ver-Bullet

This is one of those things where -- as I call it -- a little knowledge goes a 'wrong way.'

Short version: Enabling SMT can make a processor slower in some applications. And yes, both the OS and applications can mitigate that with time and experience. But it's not a simple 'fix the OS scheduler' answer.

Details ...

In a modern, superscalar, pipelined processor ... at any time, less than half of the stages in a pipeline are doing anything. Yes, your processor has over half of any sub-units do nothing at any time (that's a mega-oversimplification, but represents the end-reality equivalent). It's even worse in the x86[-64] CISC (Complex Instruction) architecture.

► Show Spoiler

Clocked Boolean Logic (CBL), the instruction set and CISC are all 1960s Computer Science (CS) "constructs" that Electrical Engineers (EEs) were unable to eradicate in-time before the x86 'took hold' -- long story.

Short version: CBL and assembler are not how processors actually work, not at all, and definitely not in the 21st century. You'll hear otherwise from the 500+ colleges teaching CS and programming in the US alone, but for those ~75 that have actual, real EE-semi programs, it's the truth.

I.e.,

1) Boolean logic comes from pre-integrated circuit, when mathematicians and computer scientists threw switches, to make logic decisions. They stupidly replaced the switch thrower with a clock, which is the worst thing from an EE perspective. Boolean logic should have never been used in computers, because it lacks an inherent, self-synchronizing control. But here we are, a CBL world. The solution EEs have taken is to make sub-units completely CBL-free, and asynchronously timed, sometimes not using boolean at all, but a form of other logic, like Karl Fant's Null Convention Logic (NCL), which is very EMI-resistant (think space and military operations).

Disclaimer: I professionally worked with Karl Fant and Steve Furber (co-inventor of the original ARM -- yes, the architecture in your smartphones and tablets) in the late '90s, when there was a short-lived Silicon Valley based in Orlando (Lockheed-Martin's Real3D, which became Intel's GPU linage to this today, although both ATI and nVidia have fabless design teams in Orlando to this day). My degree focus is in semiconductor materials and layout, the only time I actually used my degree specialty in my career.

2) RISC didn't come about until the 3rd gen (C compiler) became commonplace, which allowed EEs to 'remove the root cause' at the assembler mnemonics -- i.e., assembler is a human for of generating machine instructions. As long as programmers used it, it would be only as efficient as the coder. But the instruction set itself was a math/CS artificial construct. EEs think in MEAGs/'one-hot' (long story), which means, if it can boil down to a trace, it's easier to layout and, more importantly, time better. Ergo ... RISC, reduced instruction set computing, more direct traces.

Even today, the variable length 8 to 128+ bit (and that's just for 32-bit) x86[-64] 'word' is often 'decoded' into what's known as RISC86 (NexGen designed on in 1986, it's at the heart of AMD's ALU since '94), a fixed 32-bit (or newer 40+ bit) 'word'.

Since the '90s, programmers are not going to out-smart the EEs that wrote the optimizing C compiler in assembler and, at best, only those in-line assembler instructions in the programmer manual they wrote should be used, at most. It takes years to understand the underlying layout and quirks of any design, which is why assembler has become quite useless since the '90s (other than in-lining where the EEs tell programmers

3) The 'CS Einsteins' Intel learned this first-hand when EPIC (Explicitly Parallel) and Predication (remove Branch Prediction, to regain a lot of silicon) failed spectacularly as Digital Semiconductor said it would. You never heard of EPIC/Predication, only Itanium. The problem with Itanium wasn't lack of x86 compatibility. It was that EPIC/Predication worked even worse than CISC. It tried to optimize the chip at the instruction set, which every EE major told every CS major it would fail, and it did.

EPIC/Predication It's was like watching a construction worker design a bridge.

They had no understanding below the digital level, like layout and timing, the relativistic effects of electromagnetic fields and the limits of light (yes, back as early as 1989, the clock couldn't reach the other side of the chip in a single cycle), kinda like a construction worker doesn't know the first thing about elementary engineering statics (much less dynamics) -- all of which requires a deep understanding of differential equations to even begin. I.e., in the case of semiconductor signals, materials, etc... this is calculus beyond just the first year, where as civil engineers (bridge builders) can get by with just 2 semmesters and do statics (dynamics and EM is another story).

It was like watching a bad joke for 5+ years. The design world today fully admits the greatest RISC design is Alpha, purposely designed to be the most anal on timing and avoiding bottlenecks, it destroyed x86 (let alone Itanium) in performance, but Digital decided to break itself apart in the late '90s (they designed almost everything -- from chipset logic to network ASICs at the time -- and make a huge amount of money for its stockholders (of which it did, and very well). At least AMD gained most of their knowledge, especially when they absorbed the spin-off API Networks (API stood for Alpha Processor, Inc.)

Symmetric Multi Threading (SMT) itself is a technique, a layer of register/microcode-backed abstraction in the processor, which attempts to fill unused stages of a pipeline with another -- albeit faux ('virtualized', sort of) -- pipeline of stages. In other words, you only have X cores, and you are running n*X threads through it -- where n usually is 2 (any more with today's x86 legacy designs, and it would be far more inefficient).

► Show Spoiler

At the basic level ... this should have nothing to do with the OS, as it's presenting n as X -- e.g., 8 cores as 16 threads, so the OS schedules as 16 cores. However, the OS can factor in.

E.g., Windows isn't exactly known for shipping a flexible, modular kernel and library set versus ... say ... Linux either (don't get me started).

Because ... the main problem?

There's overhead in SMT. There is register renaming, stack and even stalls, depending on the stage and what it was doing, involved. Things that can really be self-defeating if it doesn't do things efficiently. Just because the OS doesn't deal with it doesn't mean it's not going on.

And this happens on the Intel i-series cores, just like the AMD Ryzen.

But because Intel has been doing SMT since the Netburst (Pentium 4) architecture, Windows has learned how to optimize for it. But there's a reason many of us disable SMT on even Intel processors, for gaming and some other applications, things that are thread-ignorant, and have critical paths on a single thread or two.

Threading is good for either multi-user or well threaded applications. It's not so good for most traditional gaming engines, which have critical paths on 1-2 threads.

Even worse? Virtually all compilers can not only optimize for specific instruction sets and extensions (which are usually loadable), but more importantly, they optimize for scheduling. The classic is the old, in-order (non out-of-order) Atom. If you optimized for a full Intel i-series, it would suck (like 20%+ degredation) on the in-order Atoms. But if you optimized for an Atom, there was only a 1-2% hit on the i-series.

This is the first AMD SMT design. There are bound to be use cases that are found where it degrades performance ... just like the Pentium 4's SMT did. Over time, they will be addressed, both via AMD microcode updates, and then newer processors ... as well as at the OS level. You'll see the Linux Upstream kernel publish these first, and then FreeBSD will use them, which Apple (the Darwin and other codebases of MacOS X, iOS, etc... is based on FreeBSD) will use, and the NT team in Microsoft will track these as well.

It goes the other way as well, but usually not in a way consumers see.

► Show Spoiler

YouTube · Postby **thebs** » Mon Apr 17, 2017 5:06 am

Short version: Of course the highest-end Ryzen 7 (1800X) has the highest latency! It's part of its design. That's why most people are recommending the Ryzen 5 (1600X) for gamers (or just go i7), as more cores and other things -- at lower clocks -- make things worse.

As I said, a little knowledge goes a wrong way.

Details ...

Obviously you haven't looked at an even 2+ year-old, 14-18 core Intel E5.

But Windows users usually don't know the first thing about Non-Uniform Memory Architecture (NUMA), and why you should never leave it in 'interleaved' mode.

Now in defense of Windows users, and even sysadmins ... other than for basic Windows Servers (file/share types), where you 'pin' one set of services to a set of NUMA nodes, and tell the OS to do everything else on the base (0) node ... when you're staring to talk about optimizing for the NUMA topology, you're usually talking an application for Linux, not available (or not in an advanced form) on Windows.

For those that know Linux and what numactl does, here's a Reddit thread intro on the Ryzen (which was also the case with select, high-end FX designs too): https://www.reddit.com/r/linux_gaming/c ... _on_linux/

I've been dealing with this since the late '90s, and since the mid-'00s on the PC with multi-socket Opterons. It's been a same socket issue on AMD and Intel for the last few years. The problem is that people are thinking this is a Ryzen-only thing. For those of us doing everything from High Performance Computing (HPC) to High Speed Trading to Carrier-Grade Switching (telco backends), this is everyday life.

I crap you not when I just had someone last month give me that standard, naive, "More memory should make it faster" assumption, when I tried to get them to half their VM's memory request. I hate being right.

Understand Intel has the same problems, but they don't make them available in common, affordable consumer versions ... precisely because virtually all consumer applications are NUMA-ignorant (let alone don't thread well). AMD has opened up its entire designs to consumers, at a killer price-point. There are great latency issues when you start hitting these core counts, with the type of segmentation on-die, over HT links. It's basically like separate sockets on the same die.

E.g., on the E5, there are separate QPI rings inside of the chip, which Intel hides by default. One has to enable Cluster on Die (CoD) to create different NUMA nodes and address the added latency, including separate DDR controllers on different, bi-directional rings. Latency is very, very similar, and that's before we even look at the SMT -- just straight up 1 thread per 1 core (and not 2:1).

So Ryzen goes there too. AMD is clearly targeting the low-cost supercomputer market. AMD has always had a superior FPU (both precision and performance -- popular assumption is not reality), and it's clear they want to take this market back from Intel, like they had it in the '00s. Rizen is the way forward.

AMD is also hoping the price point and popularity among consumers will work out the kinks and get help in the OS scheduling, like Intel has since the Pentium 4 was first released ... so they can scale to much, much higher core counts. Ryzen is just the beginning for AMD, especially as Intel's fabrication lead is slipping the more they outsource too.

Cmdr Kharma wrote:Hmmm......Even Microsoft thought it might be a problem with the 10 scheduler........

But picture paints a thousand words........

ryzenping.png

Sounds like typical HT latency to me. Intel has similar with QPI. HT on-die is going to hurt, but it's how AMD can scale. Over time the OS will schedule better, and applications will be NUMA-aware.

Stick with the Ryzen 5 for gaming, or splurge for a high-end i7. The Ryzen 7 is really for professional computing with well threaded applications. The Ryzen 5 maxes that clock, and has fewer cores enabled.

YouTube · Postby **thebs** » Mon Apr 17, 2017 5:43 am

TorTorden wrote:Yeah, its interesting to see if Amd and Co gets it sorted.

There's nothing wrong with the chip. It's the assumptions people have, combined with the lack of bake-in when it comes to the OS and apps.

SMT was pretty crappy on the Pentium 4, and even for the original Core and i-series, we usually disabled it on gaming systems. This is now AMD's first SMT attempt.

But same-die (same socket) NUMA is not new to AMD, or Intel for that matter (just not consumer processors) ... and Windows absolutely sucks at it. Even the lower costing E5v3 'server' chips ship from Intel with Cluster on Demand (CoD) disabled for a reason, because of Windows Server.

Even MCSEs don't understand. Being a former Irix/MIPS, Solaris/SPARC and Linux/Alpha HPC guy, I lived NUMA in the '90s. Now it's in sub-$500 AMD consumer processors, even $500 Intel servers processors.

Those running Linux know to enable it, and use numactl, cgroups, et al. I know when I have a good application when it does it for me. When I don't, I'm totally bit--smacking the developers and vendor, because I've gotta manually setup crap. Add VMware administrators who don't know how to code into the mix, and I'm pulling my hair out (just like I am right now at work

).

I kid you not, but I spend about 2-4 hours/week explaining NUMA to application developers. They are f'ing ignorant fools. At some point, I'll just say, "let me see your code." The VMware guys don't like to expose NUMA as well, even when they are going off-node and it's impacting performance, or stalling VMs for several clock cycles (which is deadly to a compute cluster).

Just like applications should become more threaded and processor-specific SMT aware, they will need to become more NUMA aware too. Again, Microsoft Windows is not known for NUMA control. It's facilities are pretty crappy.

But Linux is designed for it, like it was 64-bit clean in the mid '90s, thanx to Alpha. Even when Windows gets something in this space, it's usually a BSD/MIT licensed Linux and/or BSD UNIX library. I mean, there's a reason why Microsoft Azure's software defined networking and storage facilities are running on Linux (have good colleagues from various Linux companies working on it). Windows sucks in this space.

That's why we use Linux in things like IEEE 1278 ... from military wargaming (it's original purpose at my Alma Mater, and I interned on back in the early '90s) to high speed trading and intelligent, wire-speed network filtering and routing. People ask me how I went from aerospace (with a short stint in semiconductor, where my actual degree is) to financial (with some telco mixed in ).

Stuff that was specialized 25 years ago is now becoming commodity in the PC, in a single die, in the second half of the '10s. And consumer Windows really isn't ready at all.

But it will get there. Just like Windows finally got I/O MMU support, thanx to AMD in the early '00s, so when Intel finally added it in the '10s, it was ready. Same will happen here thanx to Ryzen, a consumer chip that forces NUMA control on the OS, and applications to be aware of it.

The days of me having to profile the application and constrain it to a NUMA node will be long, long gone. About time.

Postby **Cmdr Kharma** » Mon Apr 17, 2017 3:25 pm

So Far...

Corsair Graphite 780t Full Tower Atx Pc Case (white) - In Stock

AMD Ryzen 7 1800X 8 Core AM4 CPU/Processor - In Stock (Although thinking about dropping to the 1700)

Asus AMD CROSSHAIR VI HERO AM4 Socket ATX Motherboard - In Stock

Corsair RM850x High Performance Power Supply - Out Of Stock

Corsair Hydro Series H60 - High Performance Liquid CPU Cooler - In Stock

Samsung 850 EVO 500GB 2.5inch SSD - In Stock

Toshiba P300 3TB 3.5'' SATA High-Performance Hard Drive (OEM) x 3 - In stock

Windows 10 Home 64-bit OEM - In Stock

Nowhere to be seen......ROG Strix 1080Ti......G Skill Flare X 3200 Ram........

Trying to get everything from one vendor is a nightmare.......

Also Mrs has just said I have to pay for it all in one lump......

She does not want anything else added to the monthly outgoings....

Which kinda pissed on my fireworks as I had a 29 Month interest free credit card lined up......

Oh well.......

Elite: Dangerous PvE - Mobius

Group Members: 40,000

That should do it.....

Re: That should do it.....

Re: That should do it.....

Re: That should do it.....

Re: That should do it.....

Re: That should do it.....

Re: That should do it.....

All SMT (even Intel) can be slower for some applications ...

Don't go Ryzen 7, stick with Ryzen 5 (or go i7)

Consumer NUMA is here, Windows isn't ready ...

Re: That should do it.....

Who is online