Cascade of problems

It is unbelievable how things can go bad starting from a small number of problems. This afternoon, I was overwhelmed by several hurdles, but there were only two main root causes: keyboard instability and network bandwith issue.

Everything started with the delivery of my new AZIO keyboard and Razer Taipan mouse. Well, these are low-risk plug-in devices that won’t disturb my work too much, so let’s plug them in and see how that goes. The mouse worked fine. It seems sturdy and scroll wheel is working well, far better than this Microsoft Comfort mouse I used  a while ago. Pointer moves smoothly and there are several extra buttons that could be useful. Sensitivity can be adjusted with the touch of a button, so it can be decreased for office work for the pointer to move at reasonable speed and increased in games like Fract where I had to move the mouse five meters away to manipulate some controls!

Keyboard, on the other hand, didn’t work too well. Keys are large and the LED backlight is very nice, but the keyboard has a tendency to skip keys when I type. It skips randomly, especially the e and the s. This is a real pain when trying to do anything other than looking at emails or analyzing code or data. I tried to give it some time, but the problem persisted, up to the point I got fed up and put back my old keyboard.

Some time after I put back the old keyboard, my NX connection to my company’s remote server dropped. I had to reconnect and then got the DPI bug again. The remote desktop was running in a low resolution as if DPI scaling wasn’t disabled for the NX client anymore. I tried to restart the client, checked the DPI setting, everything was fine. I had to reboot the whole system to get my resolution back at 1680×1050 in the remote desktop.

That worked for some time, then things became laggy. Ok, we need a plan B: a VirtualBox guest running Linux and accessing the files remotely using sshfs. I already had a VM on my home PC, so I wanted to copy it on my company’s ultrabook as a first step. That was intended to run in the background and not disturb anything, but file transfer became unstable and started to slow down and stop completely. I had to initiate file transfer from my home PC: it wouldn’t work the other way round, again because of Windows. File transfer from Ubuntu failed, because the VirtualBox image was on my Windows partition and Ubuntu refused to mount it because now, Windows 8 doesn’t unmount partitions correctly, so NTFS-3G will eventuall have to adapt and implement very patchy and ugly workaround against this!

That forced me to switch back and forth between the two PCs and I was having trouble finding the buttons to switch the KVM and the HDMI switch. I don’t have a large enough desk to put two displays, two keyboards and mice so I am stuck with that stupid KVM and HDMI switch.

Things came to a total dead end when the network connection of the ultrabook stopped completely. Windows was unable to interact with my router and thus connect to the Internet. All I could do is turn wi-fi back on. I turned it off this morning, because Windows was stubbornly trying to use wi-fi instead of wired Ethernet! It took a while for wi-fi to come up, it didn’t connect automatically to my router, it took a while to connect, and connection was limited.

If I remember well, I could connect back to my NX server, but everything hung up and I had to terminate the NX client. Nothing would work: no ALT-F4, no right-click+Close program, I had to use the task manager. Then any attempt to connect back to the server returned to hung up NX session. I would have had to find back a long and complicated command on company’s wiki to reset the X server. No way! Tired of all this, I rebooted the whole server instead.

I ended up copying my files locally to not use the remote server at all and transfer the VirtualBox image using an external hard drive. That counter-productive end of afternoon totally drained me out and I was quite exasperated and overwheled after that. I have been fighting for weeks against the NX Client and the grid infrastructure I was connecting to, without nothing other than patchy workarounds that sometimes apply, sometimes fail. I felt I reached the dead end at this point. I needed a solution.

But that was working fine during the morning. Why, all of a sudden, things went south? All started from network issues: the ultrabook preferred wi-fi over wired Ethernet, the network connection to my NX server dropped all of a sudden, file transfer was unstable, Windows couldn’t access the network anymore, etc. So let’s act on network first, before fixing NX client again! Maybe that USB network interface is flawky and I would have to try with a new one.

But first, let’s remove it from that USB hub and plug it in directly into the ultrabook. That hub worked for a while, when I was using network just for sparse file transfers, but higher bandwith is needed for a full remote desktop connection. It is still good for keyboard and mouse, and necessary since that ultrabook has just two USB ports!

So I tried this, and that seemed to help! Connection to wired Ethernet happened almost instantly, Windows didn’t fall back to wi-fi as this morning, and I worked for half an hour, remotely connected, without any issue. As a final test, I transferred an Ubuntu ISO over the network and that worked without a flaw.

That hub is capable of transferring only a theoretic 480Mbts/s. It is used to carry over information from small devices like keyboards, mice, occasional data transfer from  USB stick or external hard drive, but how about something requesting 100Mbits/s constantly? That may well overload the poor little hub.

If that still bugs, I will give a shot to the Gigabit Ethernet adapter I have in the office. If that one fails as well, I will probably have to give up on this utrabook and start carrying over the heavier laptop from office to home.

Why do I suddenly need to use source to call ANY Bash script?

This week, I ran into a somewhat weird and annoying Bash issue that took a couple of minutes to solve. It was a very simple problem, but it caused quite a bit of headaches. A colleague wrote a script that was setting up some configuration variables. The script, named config.sh, was intended to be called using source config.sh. The source command tells Bash to run the script in the current interpreter rather than spawning a new process and run in it. Any variable in such a sourced script gets set up in the current Bash process, so they can be used after the script finished.

The script was containing variables of the form

TOOLS_DIR=<some path>
TSVTOOLS_DIR=<some path>
...

I tried to refer to these variables in one of my script, by using $TOOLS_DIR, for example. However, each time I was calling the script, Bash was acting the same was as if the variable wasn’t defined! The variable was accessible from the Bash process that invoked the config.sh script. Why?

There were two workarounds:

  1. Call my script using source.
  2. Modify my script to call source config.sh in it. I didn’t like this, because this was adding an extra step to all my scripts and running config.sh was taking several seconds.

My colleague and I looked at this to no avail. Then I found the culprit.

The config.sh script was declaring variables local to the Bash process. For the variables to be transferred to forked processes like invoked scripts, they needed to be exported as environment variabies! So the solution was as simple as modifying config.sh to prefix every variable declaration with export! For example,

export TOOLS_DIR=<some path>
export TSVTOOLS_DIR=<some path>
...

After this very simple change, I was able to use the variables in my script, without sourcing the scripts or invoking config.sh locally.

Bumpy Android upgrade

I recently joined the club of unfortunate owners of Galaxy Nexus that reached the down path of death. Many people told me bad things about these Nexus and about other Android smartphones in general. My brother’s device is slow and for some obscure reason, mixed up the sounds altogether. As an example, the device emits the sound of a photo camera when locked and unlocked! My sister’s phone is slow like hell, putting her to the torture each time she opens up an application. One of my friend’s phone has no more mic; he has to leave headphones plugged all the times to answer calls. Another colleague at my work place had issues with the USB port: device was not charging anymore.

My problem is sporadic reboots, several times a day, and sometimes boot loops. I thought my phone was agonizing, but I found something that may give it a second life. I will have to see in the long run, but this was nevertheless an interesting adventure.

The symptoms of my Galaxy Nexus

This started a few months ago, on Thursday March 27, 2014. The phone entered into a boot loop and could not do anything other than rebooting like crazy. One of my colleague and friend managed to remove some applications in a hurry, before the next reboot, and that seemed to stabilize the monkey for a few minutes, but that just increased the length of the boot cycles. The device was rebooting like an old agonizing 486 computer overloaded with Windows 98! As a last resort, I tried a factory reset, which helped… until last week. Yes, the device started to reboot again!

I woke up on Thursday, July 24 2014, and noticed that my phone was stuck on the Google logo. Nothing would get it unblocked, except removing the battery and putting it back. I did it, rebooted the device and it got stuck again. Argghhhh!!! I removed the battery once more, left the device and battery on my desk and searched for some solution, to no avail, except in some cases, a bug in Android 4.2 was causing the phone to boot loop and it would unstuck after a few attempts. I put the battery back and tried again: this worked. Maybe removing the battery for a few minutes discharged some condensers and reset the hardware to a cleaner state, maybe I was lucky, maybe both. But the device remained unstable and was prone to reboot, sometimes twice in an hour. The Sunday after, I got fed up and made a factory reset, then I didn’t install any application until I find something longer term to fix the issue. The device then worked without any reboot, so an hardware defect is less likely, although still possible. I need to keep in mind I dropped the phone a couple of times, including once on my outdoor concrete balcony.

That means at least one installed application is interfering with the OS and causing it to reboot! This is unacceptable in a Linux environment where each process should be well isolated from the others and from the critical system components. A process should not have the possibility to reboot the device, unless it runs as root, but my device was not rooted, so no installed application could run a root process! That lead me to the conclusion that something in the OS itself was flawed, opening an exploit that can be used intentionally or not by applications to harm the device!

An average user cannot do much about that, other than refraining from installing any application, factory resetting the phone every now and then or contacting his phone service provider and getting whatever cheap replacement the provider will be kind enough to grant him until the end of his agreement. I didn’t want to hit the same wall as my brother and get something with a smaller display and bloated with branded applications. If I really have to get a new phone, that will be a Nexus free of crapware or, if I cannot get a Nexus, I am more and more ready to take a deep breath, give up on whatever I will need to give up and go for an iPhone.

First upgrade attempt: not so good

However, I had the power and will to do something more about this! This was a bit unfortunate for my spare time, my level of stress and maybe my device and warranty, but I felt I had to try it. If the OS has a flaw, why can’t I upgrade it to get rid of the flaw and go past this issue? Well, all Galaxy Nexus are not equal. US models have the Yakju firmware from Google, but Canadian models have a special firmware from Samsung instead! The Google firmware is the one that gets updated more often, up to Android 4.3. Samsung’s philosophy differs from Google: if you want to get an upgraded Android version, replace your phone.

That lead me to the next logical step: can I flash the Yakju firmware on my Canadian Galaxy Nexus phone? Any phone provider, any reseller, any technical support guy, will tell you no, but  searches on Google will tell you YES! For example, How to: Flash your Galaxy Nexus Takju or Yakju To Android 4.3 is the guide I started from.

First thing I had to do was to install Google’s Android SDK on my Windows 8.1 PC. Yep, you need the full blown SDK! The simplest solution is to get the Eclipse+SDK bundle, so at least you don’t have to mess around with the SDK Manager to get the full thing. Then I had to set up my PATH environment variable to get tools and platform-tools subdirectory into my path, so adb and fastboot would be accessible from the command line. I also had to download the Yakju firmware from Factory images for Nexus devices.

Second step is easy to forget when recalling the exact sequence I performed to reach my goal. It is as simple as plugging the phone into a USB port of a computer. That requires a USB cable and, of course, a free USB port. Any port will do, given it works. In doubt, test with a simple USB key.

Next step was to put my device in USB debugging mode. I searched and searched for developer options to no avail! Googling around, I found Android 4.2 Developer Mode.  Bottom line, I had to go into phone’s settings, tap on About Phone, then tap seven times on the Build Number! This is just shocking crazy: how was I supposed to find this out? Fortunately, after I unlocked the developer mode options, I was able to turn on USB debugging. Without USB debugging, ADB cannot communicate with the device.

This was necessary for a simple and nevertheless crucial step: running adb reboot bootloader. This reboots the device into the boot loader, a kind of minimal OS from which it is possible to flash stuff on the device’s memory. I read about procedures involving pressing power and volume up/down buttons, but that never worked for me. This is probably like booting the iPhone into DFU required to jailbreak or recover from very nasty failures: you have to watch tens of videos, try it fifty times and get it by luck once in a while. These kinds of patience games are getting on my nerves and making me mad enough to throw the phone away. Fortunately, adb reboot bootloader while device was plugged into my computer and in USB debugging mode did the trick.

Once in the bootloader, you can use Fastboot to interact with the minimal OS. As ADB, Fastboot comes with the Android SDK. However, Fastboot wasn’t working for me: I was stuck at “Waiting for device” prompt. I started Googling again and found awful things about a driver to download from obscure places and install, the driver may differ for Samsung devices with respect to other Nexus phones, I read upsetting stuff about driver not working for Windows 8 without a complicated tweak to disable driver signature validation, about rootkits that could simplify my life if I install yet another hundred of megabytes of applications onto my PC, etc. Flooded with all of this, I gave up and just let my phone run as is. Getting out of the bootloader is easy: just hit the power button and the phone will reboot as normal.

The Penguin saved the deal!

However, one week later, an idea was born in my mind, and it was urging me to be tested! Linux may have the needed driver builtin so it would be worth trying from my Ubuntu box. That’s what I did on Friday evening, August 1 2014, and it was a success after a couple of hurdles.

First, I had to install Android SDK there as well. Once adb and fastboot were accessible, I switched my phone into bootloader once again, using adb reboot bootloader.  Then I tried fastboot devices to get, again, this stupid “Waiting for devices” message. I don’t know exactly how I got to that point, but that command finally output me a message about permission denied. Ok, now I know what to do! sudo fastboot devices. Well no, cannot find fastboot! I had to stick the absolute path of fastboot for it to work, but I finally got a device ID. Yeah, the data path between my Ubuntu box and my phone was established!

Next incantation: sudo fastboot flash bootloader bootloader-maguro-primemd04.img. That gave me a failure, AGAIN! Ok, that’s great, my phone will definitely not accept commands from Fastboot! Maybe it is factory locked to deny these? But before thinking too much, I should have read the error message more carefully and completely. It was saying the following:

FAILED (remote: Bootloader Locked - Use "fastboot oem unlock" to Unlock)

It even gave the incantation needed to go one step further. I thus ran the command, prefixed with sudo. That popped a message on the phone’s screen asking me for confirmation. I moved the cursor to Yes with the volume up/down buttons, pressed power button and voilà, boot loader unlocked!

Why did I have to unlock the boot loader? This was probably because I was switching to a different kind of firmware. If I had a US phone, probably I would be able to install Yakju without unlocking the boot loader. The unlock operation is not without consequences: it wipes out all data on the device! This was a minor issue at this stage, since I refrained from installing anything and extensive configuration until I would find a way to improve the stability of my device. I thus wiped without asking myself any question about important data to back up.

Then with the similar feeling as a wizard gathering all the components to cast a spell, I entered the following command and looked at the output.

eric@Drake:/media/data/yakju$ sudo ~/android-sdk-linux/platform-tools/fastboot flash bootloader bootloader-maguro-primemd04.img 
sending 'bootloader' (2308 KB)...
OKAY [  0.258s]
writing 'bootloader'...
OKAY [  0.277s]
finished. total time: 0.535s

Victory! Not really… That was just the first step! Next step was to reboot the device, using sudo fastboot reboot-bootloader. My phone screen went black for a couple of seconds, enough to fear for an heart attack, then the boot loader came back again! Phew!

Ok, now the radio: sudo fastboot flash radio radio-maguro-i9250xxlj1.img. That went well, similar to the boot loader. Then I had to reboot again: sudo fastboot reboot-bootloader.

Now the main thing: sudo fastboot -w update image-yakju-jwr66y.zip. That took almost two minutes, then my device rebooted automatically, this time in the new firmware. Done!

After these manipulations, I was able to set up my phone normally. Once in the Android main screen, I accessed the phone settings and confirmed I was now on Android 4.3! At least I reached my goal.

What can I do next?

There are a couple of things I will try if the device starts rebooting again. Here they are.

  1. Install a custom ROM providing Android 4.4. Besides upgrade to latest Android, this will give me an extended battery life as 4.4 greatly improved over this as I experienced with my tablet, which benefited from a custom 4.4 ROM recently. I will also be able to return to baseline Yakju 4.3 if needed. Unfortunately, I had no way to back up my 4.2 firmware, so I cannot go back.
  2. Shop for a new phone. I will try to get a Nexus 5 and if I cannot without switching provider, I will shop for an iPhone. Maybe I will find a store in Montreal providing unlocked phones including Nexus, maybe I will have to wait patiently for my next trip to United States to buy an unlocked Nexus 5 there, maybe I will be able to convince someone from a US office of my company to buy the phone for me and ship it to me (if I ship him a check with the amount of the device obviously!), maybe I will find something to make me happy on a web site I don’t know about yet. We’ll see.
  3. If all else fails, I will give up on installing any application and will use the Galaxy Nexus just as a phone and for casual Internet access with the stock browser. After my agreement with Fido ends next November, I will consider other, hopefully better, options.