Propose your solution
Attachments
Duplicates
Comments
|
|
|
You mean they should FIX the drive-killer bug.
|
|
|
@Endolith
That's just the thing. There is a lot of confusion wrt whether this bug is a 'bug', what its background is and if a fix has already been implemented.
I agree with you, that if this bug hasn't been fully fixed, it should definitely be a priority for the developers (both upstream and downstream).
|
|
|
|
Totally agree, I'm waiting for this bug to get fixed before I'm buying a new notebook. I'm a bit disappointed such a critical bug hasn't been fixed properly yet.
|
|
|
|
i put ubuntu on my hp laptop are you saying it could kill the drive?!?
|
|
|
@acer5050
well then that's really good, but you have to think why there's so much back and forth talk about it actually being fixed...there's still a lot of controversy and doubt. Just take a look at some of the comments on the bug's page.
which is precisely why this idea was born.
there are lots of hardware lists out there detailing specific laptop and hard disk models that are known to be affected. Some of them include:
http://ata.wiki.kernel.org/index.php/Known_issues
https://wiki.ubuntu.com/DanielHahler/Bug59695
|
|
glotz
wrote on the 2 Nov 08 at 19:48
|
|
|
|
Since we're talking about a kernel question I don't see how Canonical or Dell could issue an official statement about it.
|
|
|
|
Canonical or Dell could easily issue an official statement because this is a bug that affects systems they are involved in distributing. This is not a matter of fixing the bug - although that is also important; this is a matter of Canonical having an explicit, clear explanation and position on the problem that people can post links to in forums, or find on their own, to get the real scoop. The point here is to clear up the uncertainty and confusion about the current state of the problem. (In my experience, this sort uncertainty and confusion is one reason why many people prefer to stick with tried-and-true proprietary software, even when that software is drastically lower quality.)
|
|
|
@hunt.topher
thanks for agreeing with me on this!
|
|
|
lol i was actually surprised they marked the idea as implemented. But i still have issues with my hdd =(
Dell XPS M1530
Please dont mark ideas/bugs as implemented if they aren't, thats the same as lying.
|
|
|
|
I have a compaq hp 6820s it is on that list how do I know if its affecting me and how can I solve it? I'm new to linux
|
|
|
@acer5050
please check the bug description to help diagnose the problem. basically you need to install the smartmontools package and then do 'sudo smartctl -A /dev/sd*' where * is your disk number. Then calculate Load_Cycle_Count/Power_On_Hours based on the numbers listed in the RAW_VALUE column. Ideally "there should be fewer than ~15 load cycles per hour, except during heavy usage while on battery" .
As for a fix to the issue, there still is a lot of confusion and controversy.
|
|
|
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2293
|
|
|
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2293
|
|
|
|
And have you guys actually checked if the same thing happens on windows or not? Thats where it gets interesting...
|
|
|
Ok.. Enough of this crap.
Read: http://ubuntudemon.wordpress.com/2007/10/30/ubuntu-is-not-causing-aggressive-po wer-management/?referer=sphere_related_content/
Its not always the fault of the OS. It can be OS independant, and especially if you have a old western digital JS (not AAJS) drive, it wouldn't surprise me if its a problem with the firmware. We found out the reason they added the AA models was that the older models failed easier, and the cause may have been due to this.
So once again, Have you guys tested in windows AND linux?
And its not a bug, its a behavior, and technically the behavior makes sense (its done to save battery), but due to the design of drives (which it doesn't take into account), it doesn't work out that well.
|
|
|
|
Btw, my opinion is that its the harddrive manufacturers fault. They should embed policies in the firmware of the drive to ensure that a drive cant cycle continuously.
|
|
|
I just bought a new drive for my computer a month ago and after seeing this page I decided to checkout the SMART data on my drive.....
193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 16709
That seems a bit excessive. I can hear it happen every 30 seconds now that I'm listening for it. I hope my new drive is up to snuff or it seems like I'll be in the market for drives again real soon.
I don't know enough about the issue to have an opinion but it should certainly be dealt with in the proper manner which I assume is being done at the moment.
|
|
|
|
whats the result with windows in the same computer
|
|
|
There is a lot of confusion wrt whether this bug is a 'bug'
No there isn't.
If installing Ubuntu on your computer prematurely kills your hard drive, then it is a bug and it needs to be fixed ASAP.
Trying to pin the blame on someone else doesn't solve the problem. It's Ubuntu Brainstorm, not Blamestorm.
|
|
|
@Endolith. It is, but its with poor harddisk/hardware design. In the true sense of the word, its not a bug (because its performing properly, and properly designed harddisks will work fine). I have posted details why below
1) This is a known issue WITH HARDDISKS, and HAS been for over 8 years now. Micromat brought up the concern in 1999, and then everyone jumped on board and started blaming linux for some reason recently (DIGG/SLASHDOT crowd). I QUOTE: "Some drives auto-park by default every 10 seconds. Most UN*X type systems (I think including MacOS X) sync things to the disk every 30 or 60 seconds. This means that the drive parks and then very soon unparks, repeatedly".
2) Don't design hardware which can exceed its maximum operating parameters easily (like how the Intel CPU's shut down if they can get too hot). So that means don't allow the drive to cycle down at a rate that will kill the drive in 30 days. Instead, ensure in the worst case scenario, it will last much longer! Notice how your car has fuses that blow? That's to stop things stepping out of parameters, a policy. The harddisks should have a policy too embedded in their firmware
3) Even if Ubuntu fixes this "linux bug", vista still suffers it in many cases, and so do many other OS's. Its not because they are designed wrong. Its not an OS's job to know how often the drive cycles down if there is no data running to it. Yes they should fix it, but its not fixing the root of the problem. It will just reduce the poorly designed harddisks which suffer from it. Please explain to me why Ubuntu should know that a harddisk cycles itself every 2 mins, whilst another every 15? I doubt they even write that in the specs
4) Good Drive:
- Monitors the cycle rate and guarentees cycles at a maximum rate that will guarantee the drive will last a few years. By cycling every 5 mins min for instance, the drive may power down much less (less power saving, runs warmer), but at least the minimum lifespan until cycles run out is enormous.
Dumb drives
- Cycles 10 secs, so in potentially 3million seconds (or within 40 days) the drive is dead. Drive manufacturer prays that the users OS syncs constantly so it never kicks in.
5) It may be partially dell's fault, they should be testing stuff like this. And as micromat knew about it in 1999, so should have the linux distro's however, if the harddisk designers made planes, they would only test them in perfect conditions, without telling pilots explicitly that they will fall apart in bad weather (instead of testing the bad weather and ensure they will survive).
At the end of the day, the best way to solve this is:
1) Hardware database, identify the drives and the OS's they are suicidal on.
2) Petition the harddisk designers to implement controls in their firmware to keep these drives within parameters (or change their current power settings)
3) If a statement is made by ubuntu, it should be made in joint with other OS's, and mention why the drives are failing. And demanding updated firmware for drives. Fix the problem at the source
4) Fix it on a case by case basis. Fixing it on any OS is hmm..
4) Code a testing suite that can automatically track the SMART parameters and tweak the kernel to prevent the load count increasing so quickly.
|
|
|
why speculate? why don't we let Canonical and Dell let us know the whole truth, and nothing but the truth?! we CAN handle the truth! :D
cast your votes of support please :)
|
|
|
There.. Generic Solution that fixes the symptoms of the problem. Happy?
You aren't? Doesn't surprise me. Seems many of you Slashdiggers think that there is a magical patch that can be applied to fix the problem, and that its a bug in the kernel. There isn't, because "its a feature". The only other way is don't park the heads at all (which means your laptop HDD will die anyway). Too many of you run with the crowd, and one thing I have noticed, is that if the title says its a bug, you guys assume its a bug, even if you don't quite understand the problem.
I only ask you take my advice. Harrassing developers is neither productive, nor will it get anything done! You'll just scare them off. Developers working for free, don't want to wake up and hear idiots run their mouth on slashdot/digg about "bugs", and how bad they are. Instead, they'd rather hear solutions, and ways to fix the root of the problems (which is at the hardware level, NOT software level). I reckon half the people whinging about load cycles didn't even both to understand its purpose (and probably still have no idea what parking the heads means).
Be the leader, don't join the herd. Digg calls something a bug in an OS, question it.
As I have pointed out, the people here with the problem still haven't tested it in vista.. They just assume.
|
|
|
|
I think my drive died last night. Awesome.
|
|
|
@Endolith
That just sucks :( .
|
|
|
Mark Shuttleworth responded to a question from somebody who obviously was aware of this idea on the #ubuntu-classroom IRC channel on Freenode today. [jcastro aka Jorge Castro was moderator. sabdfl is aka Shuttleworth. Pasting the transcript here for commentators:
--
15:39 jcastro QUESTION: Does Canonical ever make "Official Policy Announcements" on contentious issues? Two recent controversies were the hard drive wear issue and the issue with the Intel network cards being bricked. Are there official guidelines on what should be done when Ubuntu can possibly damage hardware?
15:40 sabdfl yes, we have a process for handling emergencies and screwups
15:40 sabdfl including making sure that we communicate clearly about what the situation is
15:40 sabdfl unfortunately we have that because there have been emergencies, and we have in the past occasionally screwed up
15:40 sabdfl but i think the policies are good
15:41 sabdfl i don't think such an issue is contentious - if we make a mistake, we need to sort it out, and keep people briefed
--
I think we should be pretty optimistic about hearing an official statement soon! wuhoo! :)
PS: couple of other hardware compatibility questions were relevant too such as:
--
16:25 jcastro QUESTION: The hardware database mentioned earlier seems limited to certifying whole machines. It seems like it would be more useful for most of us if we had a listing of individual hardware products that were known to work (or not work), particularly video and wireless cards...
16:25 jcastro QUESTION: .. Yet few wireless vendors would see the point in submitting their hardware for certification unless there were already a database to be added to. Is there a plan to get that sort of certification?
16:25 sabdfl i think jcastro pointed to the hardware database earlier
16:26 sabdfl we try to aggregate the information folks send us about their hardware
16:26 sabdfl it's difficult to do component-level certification
16:26 sabdfl because often, something breaks at the system level
16:26 sabdfl we do work with component manufacturers, though, if there is a machine that needs to be enabled
--
I had missed the earlier questions, so asked this crucial question just in case:
--
6:55 jcastro QUESTION: how robust is the laptop certification process between Canonical and its partners? should customers expect 0% system breakage (in terms of hardware or software)?
16:56 sabdfl they should expect it, and we strive to deliver it
16:56 sabdfl see above for how we handle emergencies and screwups :-)
--
cheers everybody! #ubuntu-classroom is a great place to hang out! :D
|
|
|
Hmmm... I had woken up to an unresponsive black screen with a mouse pointer and nothing else, which then listed a bunch of gibberish about I/O errors on the Ctrl+Alt+F1 screen, but now after reboot and fsck it's working again. Probably the first signs of failure, though, right? What kind of symptoms would the head parking problem cause?
@AndrewLuecke:
It is, but its with poor harddisk/hardware design.
If a piece of software doesn't work on certain hardware, then it's a bug with the software. Stop trying to shift the blame. Sure, hardware can have poor design decisions, but that's no excuse for letting problems like this go unfixed. If it can be made to work correctly, and it doesn't, then it's a bug.
Open source software has faults, too, and we need to take responsibility and fix them, not get caught up in pride or risk damaging users' computers for the sake of saving face. If you want to deny problems and spread misinformation, work for Microsoft.
As I have pointed out, the people here with the problem still haven't tested it in vista.. They just assume.
I was getting few to none load cycles per hour in Windows XP (I measured 0.75 per hour), and 200+ per hour in Ubuntu.
Of course, I didn't become aware of this problem until after I'd racked up over 1 million load cycles, because people were too busy telling me there's nothing to worry about to actually fix it. It still hasn't been fixed, despite the status of the bug report.
I'm glad to see Mark's response. Please fix the problem, and if you're not going to, post an official statement explaining why.
|
|
|
@AndrewLuecke:
When i had windows xp (dual boot). I heared it clicking 2 times a minute or so. So on windows xp the same problem.
My laptop is built for Windows vista, and it had vista installed as the default OS. I dont know if the problem is the same in vista cause i didnt use it =/
|
|
|
@Endolith. You are either very ignorant, or are very stubborn.... When you have to make exceptions for specific types of hardware (which have been standardised), its because they are buggy..
- If it was a software bug, it would affect all harddisks.. It doesn't. (Only ones that are designed to park the heads, more often then they can safely survive).
- If it was a software bug, it wouldn't affect Windows, Linux and OSX. It does..
- You honestly cant tell us that everyone's code is buggy? And that harddisks are perfect?
- "If it can be made to work correctly, and it doesn't, then it's a bug.". Last I checked, compilers could work around the Pentium FDIV hardware bug. Because an exception had to be made, it is a hardware bug. NCQ on many samsung harddrives never seems to work anywhere, but probably can be worked (so its a software bug).
The only fix is to write to it a bit faster so it doesn't happen. However, again, this just minimises the problem. If it doesn't happen to you in XP, its because XP sync's differently, however, that doesn't make it impervious either.
To the best of my knowledge, any fix for this is messy and is basically an exception for different types of hardware. When you need to make exceptions for specific types of STANDARDISED hardware, its because the hardware isn't following standards, and is buggy. In this case, the SATA/IDE standards say you can write to harddisks whenever the f*** you want. So harddisks should allow it.
Its a pity this is digg news, because any worthy news sites would have placed the blame on hardware manufacturers, who would have fixed it by now. Instead though, harddisk manufacturers seem to be avoiding all scrutiny, with the blame being placed on operating systems by people like you.
I dont think I can make it any clearly, but I'll say it again. We SHOULD be able to write to harddisks whenever the hell we want. The SATA/IDE standards never said that we need to sync to the hdd every second to be within parameters. Why the hell are YOU saying we have to?
|
|
Endolith
wrote on the 12 Nov 08 at 15:23
|
|
|
Al-Sahhaf, is that you?
Did you also propose that the FDIV bug was just mass hysteria and that there's no reason to write a workaround for it?
If software physically damages hardware, then the software needs to be fixed.
Swearing and screaming and blaming everyone else just makes you look bad, while the users suffer.
|
|
|
Endolith, You are proposing that the FDIV bug should have remained in the processors...
I bet my life savings you haven't contacted your harddisk manufacturer at all to let them know your drive is having a problem? Maybe you should, because with a simple HDD firmware update, they could fix the problem on ALL operating systems. Without needing hacks.
Hell, the best way would be drive manufacturers providing a utility & API to set the max head parking rate. Then we can optimise the parking rate for your OS so that you receive the best protection.
It SHOULD be fixed via hardware. Yes we can work around it (in a messy way), but you will simply keep encountering the problem again and again in other environments.
But with a firmware update, but can guarantee the safety of your drive. Fix the problem, not the symptoms
|
|
|
So again.. Have ANY of you guys complaining contacted your harddisk manufacturers and asked them for a firmware fix or utility to adjust the rate at which the heads are automatically parked?
|
|
|
Sorry i forgot to mention:
I use Ubuntu 8.10 and obviously the problem does exist. I also tried using openSUSE and my load cycle count hardly increased whilst using it for an hour ( i have no idea why). Also my LCC does increase quite a fair bit in windows vista too just not as much as ubuntu.
|
|
|
Smittynotts, its worth telling WD then it did not fix your issue
|
|
|
|
Also, request that they develop a means of allowing you to set the rate that the heads are automatically parked. Otherwise, you will simply potentially run into similar problems later.
|
|
|
Hi Andrew,
I have just called them back and explained the update didnt work after around 30minutes of him fumbling about he found something on his system saying that some laptop manufacturers that use WD hard drives come with Reduced Power Spinup Jumper setting enabled. He also said this can cause strange head parking problems. I will have to call them back later as i do not have my laptop at work with me. I will check all Jumper settings as well as trying to update the drive with different jumper settings to see if this fixes or makes this situation better.
|
|
|
Definitely try to get them to officially document the problem. Part of the reason people are getting into this mess is because people aren't standing up and demanding better from hardware companies. Instead, they have simply ignored the problem, have ignored the use of hardware firmware policies for protection, and hoped the software simply didn't suffer from it.
EVERYONE WITH THIS ISSUE should post a bug report to their harddisk manufacturer ASAP. There is no reason a user should be limited from using their hardware in specific OS's.
Its like selling Xray machines, that will happily dose the person with 10x the amount of healthy radiation when the software crashes (which could be prevented with a safety hardware shutoff). Yes the software crashed and the problem was caused by the software, but they shouldn't be selling rubbish hardware that makes this problem possible.
|
|
Endolith
wrote on the 20 Nov 08 at 01:03
|
|
|
'I keep seeing people say "it's the hard drive (manufacturers) fault." No it's not. You don't ask a hard drive to go into ultra low power mode if you are planning on coming back to it in just a few seconds. Ubuntu needs to pull it's head out of it's backside and stop and think about how often it hits the drive after it suggests to the drive that it's not going to be used with any frequency.'
|
|
|
Endolith.. What part of the problem don't you understand?
For many people, the problem is that the drive is parking the head automatically when there isn't any activity. It isn't Ubuntu telling the drive to go ahead and park the heads. Thats in the drives FIRMWARE! The only way to fix that problem is write more often.
And as stated, read my analogy. Stop spouting crap about how ubuntu needs to make exceptions for harddisks which have their power savings in their firmware set too aggressive. They shouldn't be capable of destroying themselves in the first place. Furthermore, they damn well shouldn't be shipping with such aggressive settings in the first place.
WHY THE HELL CAN'T YOU JUST ACCEPT THAT MANY MANUFACTURES HAVE MADE A MISTAKE!!! Is it because its easier for you to complain to Ubuntu then WD?
FFS, go speak to your harddisk. Slashdot users are mostly idiots. Just because slashdot says its a software problem, never means it was. It just means that you have a bunch of newbies who see part of the problem, and jump to conclusions.
|
|
|
@Endolith. Want to play "quotey quotey"? Heres a quote for you that linus made in regards to the e1000 bug..
"The _real_ bug is clearly in the hardware design that allows you to brick those things without apparently even having a lock bit. I'm hoping Intel doesn't treat this as just a software bug. Some hw designer should be thinking hard about which orifice they put their head up in."
I don't agree with Linus much, but in this case I do. The real bug is that harddisk manufacturers have made it possible for system assemblers to jumper these drives for aggressive power savings that look great for marketing (2hours battery vs 2.5? what would you choose), however, in certain conditions this can kill the drive.
Yes, you can work around the drives and hope things work, however, sooner or later, you will switch OS, or try out an OS that will have the same probs. Fix the problem, not the symptoms.
Even if you workaround the bug on every drive in linux, moment you change to another OS, you might have the same problem.
|
|
|
|
UPDATE: As of 21 November 2008 1239 UTC, the status of this bug has been changed from "Fix Released" to "In Progress" on its tracker at Launchpad.
|
|
Endolith
wrote on the 24 Nov 08 at 20:18
|
|
|
No one cares if it's the hard drive manufacturer's fault. Which part of this don't you understand?
They care that their hard drives are dying while using Ubuntu and not in any other OS, including other Linux distros.
Ubuntu needs to take responsibility and work around the problem, instead of trying to pin the blame on everyone else.
|
|
|
Endolith. I'm no longer going to respond after this. Because frankly, I feel your method should be implemented, but users get screwed by the end anyway, and it will always be ongoing. Especially seeing new harddisks are released all the time (so we can never keep up with the exceptions required).
Go get a software workaround for your broken firmware. I understand that you are too lazy to talk to your harddisk manufacturer to get things fixed properly, so lets just hack around it.. After all, open source developers are easier to blame. And you can harass them easier on places such as digg. Oh, and who cares you are so passionate about it, but aren't going to fix it yourself. That's ok. Your time is simply better spent rather then probing source code or sending bug reports to harddisk manufacturers.
But don't be surprised if your harddisk ends up dying when you install another OS (such as Windows 7, or any other OS) and potentially run into the same problem. After all, as a user, you don't care. Just force every OS manufacturer to make exceptions for thousands of drives. But if some miss a few, thats ok. After all, they will simply blame the harddisk manufacturer anyway when it fails.
So after your specific harddrive model is patched in the kernel (or wherever) using a dodgy hack, you'll be safely armed with the knowledge that your harddisk works (at least temporarily) without killing itself. But you better not change OS, or maybe not even change linux distribution. Because then you will potentially kill your Harddisk. Also, if your harddisk fails, make sure you buy a model at least 8 months old to guarentee that Ubuntu has made an exception for that type of harddisk to work without killing itself.
Meanwhile, I suggest everyone else speaks to their harddisk manufacturer to try to get hardware policies into place that prevent the problem occurring at all.
|
|
Endolith
wrote on the 18 Jan 09 at 18:18
|
|
|
|
This has mostly been fixed by the Ubuntu developers, though some are still reporting trouble.
|
|
Endolith
wrote on the 18 Jan 09 at 18:18
|
|
|
|
By which I mean the fundamental bug has been fixed, not the official statement. An official statement would be nice, but I doubt we'll ever see one.
|
ziroday
(Brainstorm moderator)
wrote on the 20 Jan 09 at 12:19
|
|
|
|
This thread has gotten out of hand. The bug has been diagnosed, its a hardware issue. A patch has been written to workaround the hardware. If you are still having high load counts contact your hard drive manufacturer.
|
Post your comment
|