2

I've spent the past month or so at work using a box that seems cursed.

Backstory: We develop a piece of very graphics-heavy software, which uses lots of OpenCL, CUDA, and so on. Hence, we place high demands on our graphics hardware. This also means that I can't use the Nouveau drivers, since we can't run our software with it.

When I started at this job, I had a 12.10 box, with an AMD FirePro V5900. After leaving the computer on for a few weeks, this problem started to show; and it was soon followed by a host of other issues. These other issues (flickering screen, black windows, and more) are out of the question's scope, as they've been documented elsewhere, but from what I can tell, my problem is unique -- several times I have found pictures of what seems to be other instances of the issue, but closer study (loading the page on another computer) indicates that the glitches are on my end. What I see, whatever I might be doing on the computer, is this:

two windows open, each has patches of holes revealing some stuff behind

These patches (of 2-pixel tall holes) flicker with every redraw, many of them changing positions. I've spent a month working with IT trying to fix the issue, and so far we have tried (in order):

  • Upgrading to 13.10
  • Booting into old kernel
  • Switching to Gnome 3 (it was worse)
  • Switching to Gnome metacity (same as unity)
  • Wiping the drive and installing 14.04
  • Booting into old kernel
  • Swapping out the card for an NVIDIA Quadro 5000, rewiping drive and installing 14.04 again for good measure
  • Tried different monitor and cable
  • Tearing out all the guts, new mobo, new RAM (a weekend-long memtest turned up fine, but we were taking no chances), everything except the case new; again rewiping drive and reinstalling Trusty.
  • Testing with a beautiful new high-dpi monitor

Nothing worked -- every time, it would look nice and fresh, but it would be useless, so we'd try a driver switch, but after switching to any other driver, the madness returns. The parts are all new (now anyway), and nothing is, or ever has been, overclocked.

What now? In a few hours we're wiping my drive again, this time to try Precise. Honestly, I've lost enough productivity already, so if that doesn't work I'm switching to Mint or Arch or Windows. Right now, I'd rather just document an odd bug, and perhaps get some help putting together a bug report (and filing it in the right place).

However, I may have just kept screwing up in subtle ways with my driver configuration. Since I find that quite likely, I've decided it would be best to ask here. So, any ideas?

Moop
  • 123
  • 6
  • I am assuming you also switched out cables? It is not mentioned, and I doubt you would forget this step, but since anything could be an issue.. The link you have (of the one other case) involves overclocking the graphics, assuming you do not overclock, or have used standard settings. If you just want to report a bug, you would use Launchpad to report the bug, as in this question, and this link – No Time Jun 18 '14 at 19:11
  • Thanks, yes we've tried switching cables, no overclocking. That link was simply the only other place I could find evidence of the same issue. I'll update the question. – Moop Jun 18 '14 at 19:17
  • I just realised something -- the glitching I saw on Jeff's link was an issue with my computer displaying his picture. I'll remove that link and add a note about these glitches appearing stuck to images also. – Moop Jun 18 '14 at 19:22
  • I think it's a hardware issue. Can you test the graphics outside of Ubuntu? If the problem persisted outside of Ubuntu then it's definitely an hardware issue, and maybe your GPU (or other parts of the graphics card) needs to be replaced or repaired. Or maybe something else is damaged in your computer. – nastys Jun 18 '14 at 19:44
  • I'll ask IT to try another OS, good idea. – Moop Jun 18 '14 at 19:57
  • This question is starting to look more and more like it fits SU.SE better -- EM problems, hardware swapping, etc. All that's left is to rule out OS-specificity and the question won't belong here any more. – Moop Jun 18 '14 at 20:48

2 Answers2

2

Ok, seeing you ruled out almost everything,

What about strong electromagnetic interference near the machine?
Possibly also near the power cables, in case they have issued like incomplete connection to ground locally?

You say "everything except the case [is] new" - that could be a hint in this direction, as it's the main function of the case to keep these problems out.

The other component relevant to handling of electromagnetic interference is the power supply. It actually spends lots of it's complexity on providing well-filtered current, as opposed to just strong current.

Volker Siegel
  • 13,065
  • 5
  • 49
  • 65
  • Oh wow there's something I've completely missed -- due to an ugly floor layout problem, I'm at the end of a chain of two power bars, sharing an outlet with 3 other computers. I would not at all be surprised if there's a ground connection issue. Going to grab IT again, see what we can do about it. – Moop Jun 18 '14 at 19:36
  • Moving computer to a power outlet, will let you know if the issue persists. – Moop Jun 18 '14 at 19:39
  • Moving the computer to another outlet (and a different monitor) resulted in much worse graphics problems, even on login screen. The room it went to is the IT room, which may have more EM noise. They're currently finding a new drive to install 12.04 on, to handle all variables. – Moop Jun 18 '14 at 19:56
  • Hey! So it's related to EM interference! If a change in the environment can make it worse, then either a dfferent change can make it better, or... Wait, when you say "case", you refer to the case including power supply? So, it's the PSU that is broken in a way that makes it no longer filter for noise and EM interference, I suppose? – Volker Siegel Jun 18 '14 at 20:01
  • We've swapped out PSUs already (so no, I only meant the box of metal), but that's a very good point. IT just switched in a new graphics card, a GeForce 8800 this time. No issues so far. – Moop Jun 18 '14 at 20:46
  • @Filipq Hi, it's the first time I ask this question, but if it works now, you do owe me an accept I'd say :) – Volker Siegel Jun 19 '14 at 14:37
  • I agree completely, I'm just not sure the issue is fixed. Shall we take it to chat? I have some ideas why the issue hasn't appeared yet (after the graphics card downgrade), you might be able to help explain. Ah whatever don't worry about it, here's your accept. I'll consider reopening if the issue returns. – Moop Jun 19 '14 at 15:02
  • @Filipq :) Oh, I'm still curious, so let's chat - if we find out how to do that :) – Volker Siegel Jun 19 '14 at 15:20
  • @Filipq for chat, see http://chat.stackexchange.com/rooms/15213/discussion-between-volker-siegel-and-filipq – Volker Siegel Jun 19 '14 at 15:39
1

[Note: I noticed that this solution was already ruled out for the speciffic case at hand (RAM was replaced), but I keep this answer assuming it may be helpful to readers working on similar issues]

To me, that looks very much like a memory problem, assuming things like loose connectors were already checked.
Now, you ruled out the graphics card by changing it, right?

That would leave the main memory of the machine. Could you exchange, or at least test it?

Your note about "glitches appearing stuck to images" for sure sounds even more like a memory problem.

Volker Siegel
  • 13,065
  • 5
  • 49
  • 65
  • That was also the first thing that one of the senior programmers suggested -- hence the weekend memtest. Good thought though -- it's what I'm suspecting also. .. Well, either that or driver issues. It's always driver issues. – Moop Jun 18 '14 at 19:31
  • But you used two different graphics cards, with two differend linux versions, implying different graphics drivers, right? That ould not rule out a driver problem, just make it less probable. – Volker Siegel Jun 18 '14 at 19:35
  • Very true. Anyway, must move computer. – Moop Jun 18 '14 at 19:40
  • Oh, you even tried with an NVIDIA and an AMD graphics card? Each (or at leas one) with the native/closed-source drivers? - so that does almost rule out driver issues... – Volker Siegel Jun 18 '14 at 19:45
  • That's right -- all available drivers (xorg, open-source, proprietary-updates, and version from website) for each, with NVIDIA I was careful and reinstalled fresh before each driver retry. – Moop Jun 18 '14 at 19:54