in

Ethereum : Diagnosing a bad 1070 gpu in a simplemining SMOS rig (remotely)

Ethereum update: Diagnosing a bad 1070 gpu in a simplemining SMOS rig (remotely)


In windows, one of my 1070 gpus (i beleive its a strix thats been mining for ~18mos – but it might be a newer (~6mos) asus dual card.

either way, All 4 cards would hash in windows for hours or maybe days before the card would crash and id have to re-initialize all the GPUs with nvidiainspector.

I thought in SMOS maybe itd play better or restart itself more effectively. However, the rig is averaging 1 restart every 20min, and rebooting every few hours still

the readout doesnt give much info as to the bad card, since console only shows a few lines

10xxMIX ZEC – ON(34) Rig is rebooting due to system 56℃

25% 708 734 734 708 987 974 (95,80,80,95,95,100)

810 810 810 810 3802 3802

Miner Console

[ 349.078582] ? _nv007129rm 0x160/0x180 [nvidia]

[ 349.078728] ? _nv001140rm 0x84/0xe0 [nvidia]

[ 349.078865] ? rm_execute_work_item 0x49/0xc0 [nvidia]

[ 349.079006] ? os_free_mem 0x10/0x30 [nvidia]

[ 349.079146] ? os_execute_work_item 0x46/0x70 [nvidia]

[ 349.079147] ? process_one_work 0x156/0x3f0

[ 349.079148] ? worker_thread 0x4b/0x410

[ 349.079150] ? kthread 0x109/0x140

[ 349.079150] ? process_one_work 0x3f0/0x3f0

[ 349.079152] ? kthread_create_on_node 0x40/0x40

[ 349.079153] ? ret_from_fork 0x25/0x30

[ 349.079154] handlers:

[ 349.106517] [_ffffffffab6428d0_] usb_hcd_irq

[ 349.122009] Disabling IRQ #16

Rig is rebooting due to system error. It could be caused by too much overclock/undervolt.

this is occuring with no overclocking, decent ventilation, and a powerlimit of about 100w

AND

10xxMIX ZEC – OFF(35) Rig is rebooting due to system 57℃

24% 1506 1531 1531 1506 1582 936 (95,80,80,95,95,100)

4006 4006 4006 4006 4006 4006

Miner Console

[ 313.456995] nvidia-modeset: WARNING: GPU:3: Unable to read EDID for display device DVI-D-0

[ 314.109269] nvidia-modeset: Allocated GPU:4 (GPU-a9e959a8-8dfb-292f-8d86-742d6eed7cf8) @
PCI:0000:07:00.0
[ 314.154525] nvidia-modeset: WARNING: GPU:4: Unable to read EDID for display device DVI-D-0

[ 314.867960] nvidia-modeset: Allocated GPU:5 (GPU-a3d5362d-c5b9-ea29-5a18-78b31bfc2da0) @ PCI:0000:08:00.0

[ 314.918729] nvidia-modeset: WARNING: GPU:5: Unable to read EDID for display device DVI-D-0

[ 333.237759] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 243

[ 444.393308] NVRM: GPU at PCI:0000:06:00: GPU-5d295a74-00f0-b96b-e06c-455bfe263711

[ 444.393311] NVRM: GPU Board Serial Number:

[ 444.393313] NVRM: Xid (PCI:0000:06:00): 79, GPU has fallen off the bus.

[ 444.414544] NVRM: GPU at 0000:06:00.0 has fallen off the bus.

[ 444.414545] NVRM: GPU is on Board .

[ 444.414936] NVRM: A GPU crash dump has been created. If possible, please run

NVRM: nvidia-bug-report.sh as root to collect this data before

NVRM: the NVIDIA kernel module is unloaded.

Rig is rebooting due to system error. It could be caused by too much overclock/undervolt.




View the link

About Ethereum



Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third-party interference.

Author: klondike_barz

Score: 0

Don’t forget to share the post if you love it !

Blockchain : Bank of China To Increase Investment in Blockchain Technology

Bitcoin : Dr Craig S Wright on Twitter : And like us, @CalvinAyre Will not settle for a devs plaything The original base protocol No experiments Unlimited unbounded scale