New Year, New Blog
This blog has moved!
I look forward to welcoming you over at:
http://www.databasesandlife
Christmas holidays
It has been suggested that this blog does not contain enough personal content.
This Christmas I visited my parents in Seefeld in Austria, am now in the UK with my parents and Christina and will be going to Paris for New Years. You can see the progress of this voyage on the following Maptales story. It contains locations, text and photos.
Domain-name search Firefox feature
Installing this into Firefox (takes < 1 minute and doesn't require a download) allows you to check for the availability of domain names straight from the browser.
It was created by my colleague, who works with me at easyname.eu, but it was his idea, and I genuinely think it's a cool feature!
Next PHP uselessness of the day
There is the option "php -l" which checks the validity of a source file. Obviously it doesn't do a wonderful job as it doesn't detect misspelt variable or method names; but I suppose it's better than nothing.
So I apply this recursively to all the files under a certain directory. For reasons I won't go into here there are Postscript fonts checked into the source directory. To this files, "php -l" outputs:
I assert "php -l" is not very useful.
Unbelievable PHP limitation of the day
If one defines a class with the member variable:
then that's fine. However if one defines it as:
then that gives a compile error:
I know of no other language that I've ever programmed (i.e. including BASIC, and C) where you can write a value, but you can't write an expression.
How broken is that!
Putting spaces around the *, or adding brackets around the whole expression, does not help.
(PHP 5.2.0)
Austrian mobile working again
My Austrian mobile phone works again.
1) Originally I asked them to send the new SIM card to Macau, they didn't want to do that. Thankfully I am not the sort of business person who relies heavily on their phone while also travelling a lot. If you are that sort of person, probably better not choose Telering.
2) Then they said they would send the new SIM card to Vienna. 1.5 months later when I arrived back from Macau, it wasn't there with my post.
3) I went to my normal Telering shop at Stephansplatz, it was closed.
4) So I went into a different shop today, and they gave me a new SIM card. The girl was quite confused by the fact that there was a "lock" on my account (presumably from when I rang up and told them I'd lost the phone?). But her colleague told her to ignore that. The SIM card she gave me didn't work.
5) Just now I rang up, and they told me "it doesn't work because your account is locked". They unlocked that now, and now it works.
Beispiel
Christina and I have a cool HDD/DVD recorder in Macau. One can set up timed programs and it records them from the TV when one is away. You can even copy them to removable media (DVD).
I wish I had something similar in Vienna! Ah yes I do, I realize, I own a VHS recorder :)
To set up a timed program, I have to first the device date/time. I'd never done that (Blinking "--:--" situation). I took a look at the instructions, and there was the "Beispiel" (example) of setting the clock to December 1998. Ah yes, I remember I bought the device in the summer of 1999.
And only today, 2nd December 2007, for the first time, do I set the clock and create my first timed recording :)
P.S. The summer of 1999 is so far away in the past, when I bought the recorder I was seriously stumped by the fact the manual was only in German and I didn't speak German back then.
Back to Vienna
I am at Hong Kong airport right now preparing for the journey home to Vienna. I shall be in Vienna tomorrow (Sunday) and will be working for the next two weeks.
I have my Macau mobile with me but that doesn't roam to Vienna. I lost my Vienna mobile and they wouldn't send me the replacement SIM card abroad (to Macau), so I won't have that working before I've got to my post, which should be Sunday. So the best way to reach me henceforth is email or Skype.
Vector graphics in the browser
I've taken a look at the state of vector graphics in the browser.
Since the beginning of the web, people have been using GIFs to display their headline texts; tables and CSS to do layout; and GIFs for rounded corners etc. A lot of this would become a lot simpler if the browser was able to display vector graphics.
There are numerous systems for displaying vector graphics but the two I looked at in detail were the <canvas> tag and DOJO. Initially I realized that the <canvas> tag could not plot text, and I thought the DOJO system could. But it turns out it can't either. So I concentrated on the <canvas> tag.
The <canvas> tag, which works on Firefox, Safari and Opera, and a 1-line Javascipt include of excanvas, also works on Internet Explorer. It works well for trivially simple things. But there are a few things that don't work like I expected.
(1) No callback for painting. Essentially a <canvas> is just a bitmap (like an <img>) which you can draw into programmatically using Javascript. This contrasts to traditional vector programming, where you register some kind of callback, and the system calls you when it needs the part of the window under your control to be redrawn.
The traditional way is certainly more complex, but it has a few advantages. If you are doing any kind of document editing, e.g. you are programming Word, then documents get big. But only a small amount is displayed on the screen at once. This gets more marked if you zoom in to 500%. With the callback mechanism, your application only get asked to draw the small part of the window on display. With the <canvas> approach, you have to draw everything, all the time. Imagine a grid which needs to draw lines every 5 pixels. That's a lot of lines on a 1,000 x 1,000 pixel screen; but it's a lot more if you need to draw everything in a 10,000 x 100,000 pixel workspace that the user could scroll around on.
It simply takes too long to draw everything the user could possibly see if you have a large workspace, or view a large document at a higher zoom. On a 5k x 5k pixel <canvas> with such a grid, Firefox complains "script seems to be unresponsive", whereas on a 1k x 1k grid (which is all you can see) it works fine.
(2) Large <canvas> takes up lots of memory. As a <canvas> is just a bitmap, if you create a large one, a large amount of memory is immediately allocated. So if you are programming a Word using this technique, and the user scales the document to 500%, immediately their computer will slow down as the browser needs to allocate a huge amount of memory just to store all the pixels the user could potentially scroll around to see. And the Internet Explorer control and the Opera <canvas> seem to have a limit somewhere around 5k pixels.
(3) You can't draw text. This is surely the most amazing limitation. Look at all the vector graphics examples in the web and you'll see e.g. beautiful clock examples - but they use a JPEG as the background to the clock including the numbers. That's hardly the way I had imagined vector graphics would be used!
But there is a "solution", as we see in this nice windowing example. As the browser can already display text, and the background to a <canvas> is transparent, you can layer one <canvas> on top of another, and put <div> objects inbetween.
If one is to display e.g. a Word document, with various vector graphics and various pieces of text, one would have to create lots of <canvas>es and lots of <div>s and absolute-position them on top of one another. That seems to me like a lot of programming work to create this structure, and to maintain it. And it can't be easy or quick for the browser to render such a document.
There is a solution in sight though. There will be a <canvas> drawText command in Firefox 3. Currently nobody has Firefox 3 and none of the other browsers support it. But that will no doubt be different in a few years time.
(4) There is no way to query text metrics. If you want to have vector graphics and text on the same page, you need a way to find out the size of text in pixels. For example you want to center text on the screen. Or fit text into a weirdly shaped object. Thankfully this is supported with the Firefox 3 drawText command.
(5) Opera zooming doesn't work. Maybe this is just an implementation issue, but as the vector graphics are turned into a bitmap at the time the Javascript commands to draw into the canvas are executed, if you say "scale to 200%" in Opera, it scales the generated bitmap, as opposed to scaling the original vectors and re-rendering them.
I always object to the name SVG on the grounds that it stands for "Scalable Vector Graphics", and vector graphics are scalable by their nature, so I don't understand why the file format wasn't just called VG. But it seems someone has indeed managed to implement what I thought was impossible: "non scalar vector graphics".
Conclusion
So what is one to do to implement a client-side app manipulating vector-based documents? It is clearly the way of the future.
As far as I can see, it's still not very easy. Which may explain why there are none of them around. The only one I can think of is Gliffy - and that's written in Flash.
(a) An alternative scrolling technique will have to be found. If the user is manipulating a large document, or a small document at a high zoom, then creating one big <canvas> and using the browser scrolling isn't going to work. One will have to implement ones own document-navigation (i.e. scrolling) system. This will not be what the users expect. But it's the technique that Google Maps uses: it doesn't have windowing-system scrollbars to let you pan the viewable area within the original document (the world map): it has its own navigation system.
(b) Displaying text is going to be a pain. No more myDocument.display(graphicsContext) - where the code to display the document is decoupled from the particular drawing implementation.
The code to display a document is going to have to be quite tightly coupled with the display system (create and maintain <canvas> and <divs>). And for making modifications, I'm not sure if it's going to look nice to delete those <canvas> and <divs> and recreate them each time, maybe one is actually going to have to modify them e.g. during a drag operation, which will make the code particularly front-end specific.
COUNT(*) vs COUNT(pk_col)
A while back I was doing some performance tuning on MySQL 5 for a customer. A SELECT was counting the number of rows in the table. I always use COUNT(*) for that but I know a lot of people, including the customer, use COUNT(pk_col). The query was taking a long time (a few minutes). I analyzed that the problem came from the usage of COUNT(pk_col) instead of COUNT(*). With COUNT(*) it was instantaneous.
I didn't know that there was a difference between the two. There is no difference in their semantics, therefore it didn't occur to me that there might be a difference in the way they were executed.
Just to recap the SQL syntax:
- COUNT(DISTINCT col) counts the number of distinct values that are in "col"
- COUNT(col) counts the number of rows where col is not null.
- COUNT(*) counts the number of rows in a table
However, the database in question was executing a COUNT(pk_col) query and a COUNT(*) query differently.
- For the COUNT(*) queries it was simply counting the number of rows in the table (taking constant time and not requiring reading the rows from disk)
- For the COUNT(pk_col) it was going through all the rows, presumably checking if the pk_col was in fact null, and counting the number of rows where it wasn't, thus requiring a lot of disk access.
So the conclusion is, one should always use COUNT(*) and never COUNT(pk_col).
Sony support: Day 35 (approx)
I realized I never finished the story about Sony laptop Support.
The laptop was returned after about 5 weeks, and it did work. In the mean-time some plastic is coming a bit detached around the screen (that was one of the things they replaced), but if you handle it gently, it's OK.
It was returned to the wrong place. Well I asked them to call me before they returned it, as they picked it up from one of the offices I work in, but I work in lots of different offices so I wasn't sure I was going to be in that particular office whenever they delivered it back. But of course they didn't, I just got an email from the boss of the company in the office they picked it up from, telling me it'd arrived back in that one.
Then about 2 weeks later, they called me to ask me how satisfied I was with the service, on a scale of 1 to 5. That was wonderful. I just told him everything that'd gone wrong. Then he repeated: "no sir, I need to know your satisfaction on a scale of 1 to 5". He wasn't actually interested in what'd gone wrong; he was only interested in this single metric! I also told him that I was happy to be contacted by Sony if they were interested in improving their processes. So far they haven't contacted me, so I suppose they aren't interested.
Mr O'Reilly still uses "vi"
Cool
http://radar.oreilly.com/archives/2007/10/iphone_blackberry_excel.html
The CD Saga gets worse
On my stereo at home it plays fine (although it doesn't work on my Windows computer) but put it into my girlfriend's DVD player and it reports itself to have 17 tracks even though the cover says it only has 15. I hadn't looked closely at the reported track count on the DVD player and just played the disk. After the peaceful ending of the last track on the CD i.e. the 15th, the speakers just erupted in loud white noise. I suppose that was the Windows "autorun" software being played...
Playing a CD on a computer
... is not as easy as one might imagine.
I'm using Windows, and it seems that in one new aspect, I discover that Windows just doesn't work. (Or maybe it's the CD that doesn't work?)
Ideally I would have put the CD in the computer's drive and it would have just played it. This was Bill Gates' vision once. I think prior to Windows 95's launch, he said "I imagine a day when you can just put a Beethoven CD in the drive and Windows will play the song". (Although I couldn't find that quite on the Internet so maybe he didn't say that.)
So I put the CD into the drive and then some pop-up appeared inviting me to do all sorts of things. This was software on the CD, I think. It had a big friendly button "Play the CD" so I clicked on that but alas an error appeared asserting I needed to upgrade to a newer Windows Media Player. I should do that and the run "autorun" again, it instructed me.
I tried opening my old version of Windows Media Player and playing the CD. Then an amazing thing happened. It required me to enter a "license". It opened a small pop-up window for me to do so. At the bottom were two buttons, "Play" and "Cancel", but "Play" was in grey. In the middle of the small window was a web browser window, displaying some corporate homepage, No idea what I was supposed to do. Nor even if it had worked, what I would have done if I wanted to play the CD while not connected to the Internet.
So I tried downloading iTunes. I didn't want to do that as my notebook has physical buttons for "play", "pause" etc, and they only work with Windows Media Player. But iTunes simply didn't acknowledge the presence of a CD or CD drive at all. Possible it thought it was a data CD as opposed to an audio CD as Windows had probably mounted it as such.
So I figured, well, I'll have to install the latest Windows Media Player then, as per the original error message. So I went to Microsoft's site and downloaded it. But it wouldn't install, on the grounds that I hadn't certified my Windows to be Genuine. But I don't really want to do that, as if Microsoft software doesn't recognize the CD as genuine (and it didn't recognize a DVD as genuine once, so I had to use some software other than WMP to play it), I don't fully trust them to consider my computer genuine.
But it is genuine. All of it. I have a genuine computer with genuine Windows and a genuine purchased music CD in its original case with original cover art etc. And I indeed cannot just "play the song". I mean it all just doesn't work.
So I guess I'll have to listen to it on my stereo then.
Macau until 2nd December
I am going to London on Friday 21st September and will be working from there until Wednesday 26th September whereafter I shall be in Macau, working, until Sunday 2nd December when I'm back in Vienna.
EU Countries, ISO 3166-1 Alpha-3
For various reasons I needed a list of 3-letter country codes for all the EU countries. (These are "ISO 3166-1 Alpha-3 codes"). It would have been much better if this software had used 2-letter country codes like everyone else.
And because I couldn't find this list on the Internet anywhere I had to make it myself from some huge list of 3-letter country codes for all countries in the world.
In case anyone else ever needs this (including but not limited to me), here it is.
EU Countries, ISO 3166-1 Alpha-3
Unit testing and configuration files
I used to think of a function as something which would convert some input value into some output value (potentially with some side-effects). And thus unit testing a function would involve passing particular inputs into the function and checking that the results were as expected (potentially setting up some database rows or something to test that the side-effects were executed properly).
But sometimes a function relies on a particular piece of global configuration. For example the tax rate.
return (int) Math.round(cents * vat);
Initially I would just test the function with the current settings of the config file.
assertEquals(20, obj.calculateVat(100));
However that's obviously not a great solution as that will break when the config file changes. And after all, configuration files are there to extract the things that likely will change from the otherwise often very long but hopefully reasonably static domain logic.
So the solution I use now is to extend such configuration accessing classes with methods such as "setValueForTesting". The "forTesting" part of the name indicates clearly its purpose is for test programs only.
assertEquals(20, obj.calculateVat(100));
That code feels much better. There are actually two advantages:
- Obviously the test code will not break if the config file changes
- But also there is more locality. Everything you need to understand about that test is there in the test program's source file, in two easy-to-read lines.
Java 5 enums can be compared with ==
Java Enum instances are singletons. This seems to be not clearly documented by Sun (at least I found it difficult to find). But it's the case.
What this means is that it's possible to compare enumerated types by identity, which is cool for readability. (And it means that the switch statement works.)
You don't have to write this:
You can write:
This is documented here in the "discussion" section.
Interesting Oracle/MySQL locking difference
I know the rules for Oracle row locking well. A row can be locked for write if one updates it, or if one "select for update"s it.
- create table a (x number); (and equivalent in MySQL for InnoDB)
- Session A: insert into a values (9);
- Session A: commit;
- Session A: start transaction (in mysql)
- Session A: select * from a where x=9 for update;
- Session B: start transaction (in mysql)
- Session B: select * from a where x=9 for update;
- Session B hangs, waiting for row-level lock to be release from the row by Session A
- Session A: update a set x=4;
- Session A: commit;
- At this point, Session B returns no rows. Lock has been released, and row no longer confirms to where, so is not returned.
- Session A: update a set x=5;
- Oracle returns the row to session A. The command "for update" in session B did not return any rows, and thus no rows were locked, and thus session A has nothing to wait on.
- MySQL (version 4.1.18) blocks session A waiting on the transaction in session B to be ended. That means A requires a lock owned by B. But what is this lock? Is it a row-level lock on the row which was not selected? Some other type of lock?
- Firstly to determine which rows to return. A lock is requested for those rows, which can involve waiting until the lock is released if it is owned by some other transaction.
- After the lock has been acquired, the rows is checked again to see if it still confirms to the where clause. If not, it is not returned (although the session blocked waiting for it)
3-dimensional photo organization
I have just viewed some photos on Facebook. They were of a friend's trip to Malaysia.
- Facebook has a limit of 60 photos per album; meaning you have to split photos up into albums with names like "Malaysia 1", "Malaysia 2" etc if you want to upload more than 60 photos in total.
- Each album, as is current practice in web design, is divided into pages with "page next" buttons to get to the next page.
- Each page of each album, as was introduced with windowing systems, has a scroll bar (vertical only, unless one makes the window really small)
The scroll bar is quite a good device. It was well thought through. It was specifically developed to solve the problem of "you have more data than can fit on the screen". You can move slowly up or down using the arrows at the end which are deliberately easy to understand even for novices unfamiliar with windowing systems. You can see how far down the available data you are. You can drag the bar with your hand/mouse to move either fast or slow in a natural motion.
I have heard that some web novices find "next page" easier to use than using the scroll bar. But this wouldn't be the case if there were no "next page" links. And knowing how to use scroll bars is non-optional, if you want to use any other system other than photo browsing websites. For example when using the compose interface of an email website, there is no "next page" button once you've typed text equal in length to the size of the window the user interface designers assume you are using.
Scroll bars are so much better than "next page" links, and even if they weren't, displaying 1-dimensional data using 1 data navigation tool is better than displaying 1-dimensional data using 3 different navigation tools.
Nextstep wins
This article references this article which asserts that:
- When you copy a directory into a place which already has a directory of this name, Windows 95+ asks you if you want to "replace" the directory. If you say yes, it replaces the individual files, i.e. merges the new directory into the old
- On Mac OS X it also asks you if you want to replace the directory, but this actually deletes the old directory first
- That the Windows behaviour is better as it's less destructive, and other reasons
- Independent of if the Windows behaviour is better, the word "replace" implies the Mac behaviour. I have been confused by this before (assuming that if I click "yes" to the "replace" question that it will delete the contents of the destination first, i.e. replace them)
- There are times when you want merge (merging photos from a digital camera) and times when you want replace (replacing one source tree with another)
- Nextstep would ask you if you want to replace the destination, or merge (or cancel)
Just one more example of how technology gets worse with time. Or at least not better anyway, on average.
Tax
This guy living in Somalia, a country without government and thus by definition in anarchy, does his job but has to give half of what he produces to gunman, who "protect" him. He isn't really very up for this situation, but what is one to do.
From http://news.bbc.co.uk/2/hi/africa/4040889.stm#mahamut:
I get about 20 rods a day but I have to give half of them to the gunman who controls the area I work.
...
This hammer is very heavy and if I had a choice, I would do something else.
But if I could not go to school and had to carry on doing this, at least if there were a government, I would not have to give half the rods to the gunman.
Right. This guy has not tried living in a country with a government recently.
On all of my without-VAT income I pay 47% tax (average) to the government. Then there's the 20% VAT charged on top of the without-VAT income.
And now I have just got a bill because I paid some tax late. I've no idea which tax I paid late or why. I pay the tax bills immediately I get them. I am not a fool. (= I am scared of the government.)
This late penalty is €104.47. Which is quite a lot, I think, for paying tax late (which I don't remember paying late anyway). I mean what's this money for? I wish I earned enough, that the interest on my monthly tax, that the government lost because I paid something a few days late, was over a hundred Euros. But alas, unless interest rates went up dramatically and I didn't notice, that's not the case.
Or maybe it was the money they had to pay to process my lateness. In which case I wish I was a computer. Processing some database row, collect €104.47. That's a well-paid computer.
So it doesn't matter where you are, or what the style of leadership is in the country you happen to do your work. There's stuff to be paid and there's little you can do about it. And if they decide you've done something wrong and have to pay even more stuff, there's nothing you can do about that either.
Insert documentation here
Ah I really hate opening code and seeing the following
* Insert class or interface description here.
*/
This is created by the IDE's helpful "create new class" (and similar) menu options.
I wish people would actually write documentation. Even a single sentence to describe what the class is modeling would be helpful if it's not obvious from the name. Or object invariants (e.g. boughtCount <= offeredCount).
To find a class without documentation is annoying. But to see such an IDE-generated phrase is a slap in the face!
Fancy website advertisements
We all know that advertisements on websites are annoying; for example those banner ads with bright flashing colours. It makes us want to use the website in question less. We all know that the way Google does its advertisements—non-intrusively—is much better for all concerned: the website (users are less annoyed), the users (they are less annoyed) and the advertising customers themselves.
Just now my computer was incredibly slow. I looked at the task manager and saw Firefox was at 70%. I had no idea why, but I looked at all the windows and one tab of one window was a consumer website with a big moving banner advertisement. I wasn't using the tab, but I like to have that website open (as I do many others).
Suspecting that was the reason for Firefox using so much CPU I closed the tax with the task manager open and indeed thereafter the CPU usage of Firefox reduced to 0%.
So that means that website is effectively preventing me from having its window open, as having its window open prevents me from working. So now I have gmail, Facebook, BBC News and a few other websites open, but not that one. I wonder if that's what the people running the website really want?
P.S. Safari reduces the speed of Flash animations on windows which do not have the focus, to save CPU consumption.
P.P.S. In addition to just making my computer slower, it would deplete my laptop battery faster.
Copying the contents of one directory into another: not as easy at it looks!
Task: You want to copy the contents of one directory into another existing directory. On Linux.
I.e. if the source directory is "x" and the destination directory is the already-existing directory "y", if there are files "x/1" and "x/2" then files "y/1" and "y/2" should be created. If "x" is an empty directory then no files in "y" should be created.
Now, this is not as easy as it sounds.
This command copies "x" into "y" meaning that the resulting files end up being "y/x/1" etc.
This copies files like "x/1" into files "y/1" correctly, but if "x" is an empty directory, an error is presented, that the file "x/*" cannot be found.
Surely this should be easy! I even considered firstly deleting the directory y, and then copying x as y.
cp -r x y
This is rather inelegant as you have to set the permissions on "y" again if they're non-standard, and it doesn't work if "y" isn't empty.
I came up with the following solution.
cp -r --parents . ../y
This copes the "current directory" and all children (i.e. all files) into "y", but the "--parents" option tries to create any hierarchy leading to the source into the destination. So if you copy "a/b/c" into "d" then it creates "d/a/b/c" as opposed to just "d/c" which it would normally create.
In my case the "hierarchy" is just "." so it copies e.g. "./1" into "y/./1" i.e. "y/1".
Concorde Fallacy
This is a good term for a commonplace management error.
http://www.answers.com/topic/concorde-fallacy
Java: List<X> or X[] ?
Since the creation of Java 1.5, one's been able to parametrize classes using generics, with a syntax similar to C++ templates.
Before Java 1.5, I would always return simple list data structures as arrays. This was
- Type-safe (e.g. User[] as opposed to List; the former one knows what's in the collection, in the latter one doesn't)
- One could find out the length of the collection with array.length (in contrast to C arrays)
Perhaps it's because I don't like change, but I would still advocate using arrays as opposed to Lists:
- The generic information is thrown away at compile-time, so a List<X> and List<Y> look the same at run-time, whereas X[] and Y[] do not. Introspection, and getting exceptions at the time of a wrong array cast, and not later, are the benefits here.
- You can easily create an array declaratively. int[] x = new int[] { 1, 2 }; You can't do the same with the collections frameworks.
- I'm sure arrays are faster
- Arrays are also simpler. I think one should, given two solutions to the same problem, nearly always take the simpler unless there's a clear benefit of the more complex solution (which I don't see in this case)
- To iterate over the collection, with List<X> you need to create an Iterator. This also happens if you use the "foreach" construct.
- To get a particular element, or to get the length, you have to call methods, like aList.get(3).
- One can iterate over an array or collection with the Java 1.5 "foreach" keyword: so in this case the source code looks the same.
- The code "for (i=0; i<array.length; i++)", i.e. non-iterator code, is not really difficult to write or difficult to read.
Sony Vaio Support (Day 12)
Sony rang on Friday. They wanted to know the guarantee number (the number that I don't have, which should have been on the invoice; the invoice which originated from them, and I've sent them back twice already).
I took the opportunity to ask how the repair was going (as their website where you can track the repairs doesn't work). They told me the battery was broken and they had to send off for a new one, which hadn't arrived yet. There are two things wrong with that:
- They are a Sony repair shop. They are Europe's Sony repair shop (as far as I know). Why don't they have any batteries available in stock? Maybe this is the first time ever that a notebook has needed a new battery, but I doubt it.
- The laptop didn't work without the battery (i.e. with just the power cable connected). So I doubt it's the battery which is broken.
Hibernate / Boolean Fields / MySQL 5.0
There's a problem persisting boolean fields using Hibernate 3.2.2 to MySQL 5.0, if you allow Hibernate to generate your schema, and you leave Hibernate to generate the schema in the default way. It works fine on MySQL 4.1 and it doesn't matter if you use boolean (primitive) or Boolean (object) types for the fields.
with a class such as:
public boolean getMyField() { return myField; }
public void setMyField(boolean x) { myField = x; }
and allow Hibernate to generate the schema on startup, e.g. by writing the following in the "hibernate.cfg.xml" file:
org.hibernate.exception.DataException: could not insert: [com.company.MyObject]
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
....
The solution is to change the Hibernate mapping for the field to this:
Then the field is generated as tinyint(1) and then it all works fine again.
Generate Javadoc HTML only for public members
In Java there are four protection levels which members (fields and methods) can have:
- Private
- Protected
- Package-level
- Public
But when one generates the Javadoc, which protected levels should be included?
Generated Javadoc is used by humans. These humans are probably not you. And thus are probably clients of your classes, either within or outside of your organization. It's possible, although unlikely, that they may be able to access package-level members. It's possible they may need to subclass your class, although in (nearly) all cases I can conceive of, they won't do that without looking at your source code.
Javadoc should be simple to understand. There's simply a lot of potentially documentable stuff going on in a class, which is capable of reducing simplicity. Setters which only Hibernate needs to see (private), or which only your factories in your package need to see (package-level).
Javadoc should therefore only be generated only for public attributes. That's what Sun's JDK docs do as well (for example you don't see any protected or private stuff here). And there's an additional benefit of simplicity is that if this is the only level for which the Javadoc is being generated, it doesn't even state the protection level in the summary, so you see "int getX()" in the method list as opposed to "public int getX()".
This can be achieved with the "-public" option to the Javadoc generation program. In Netbeans 5.5, right-click on the project in the "projects" tab, select the menu item "properties", go to the "documentation" entry under the "build" entry, and enter "-public" in the "additional javadoc options" field.
Sony Vaio Support (Day 5)
After being picked up on Wednesday, I see the computer arrived at the repair place in Germany at about 10am today. (The UPS tracking website works, in contrast to Sony's).
That's 2 days to transport a laptop from the capital of Austria to Germany! That's hardly quick, or the overnight delivery they always advertise on TV!
So presumably Sony chose to save some money when selecting which UPS delivery speed to use.
Sony Vaio Support (Day 3)
OK It's all gone well. It's 14:43 and the UPS guy has come and collected my notebook.
He didn't seem to have any packaging materials with him (in contrast to what the woman said on the phone). So he took my unscratched laptop and just carried it off without packing it. Presumably he has some packing stuff in his van? Or maybe he'll just not pack it so it gets really scratched? Well, I'll see when I get it back.
Sony Vaio Support (Day 2)
OK Tuesday was much better.
The girl from Monday said she'd (try to) ring me, even if nothing happened. She did ring me, at about 5pm (which made me a little worried that she wasn't going to call), and said that the laptop could be collected the next day. They hadn't resolved the warranty issue, but they were prepared to accept that I had one (on the grounds of the invoice that I faxed to them, which they sent to me originally confirming my purchase), so were prepared to pick up and repair the laptop for free.
Concerning the pick-up,
- Obviously they couldn't give me an exact time, only 9am-5pm.
- So I chose my office address. It's not easy to find me in this office, so I told her to tell them to go to the address and give me a call. She said they have a limit on how many calls they can make, so they might not be able to call me.
- So I gave an incredibly detailed and complex description of how to find me. Then she said she only had a small text field to type the instructions in, so she'll abbreviate them, and hope the guy understands.
So here I am, Wednesday, Day 3, in the office, at 9am, waiting for the laptop to be picked up. How do people do this every day?
Sony Vaio Support (Day 1)
So my laptop is broken. No problem, when I bought it, about a year and a half ago, I paid about €150 extra to get the warranty extended from 1 year to 3 years, and them pick it up if something goes wrong. I'm glad I did that; now it's time to use it.
Hmm, not as simple to use the warranty as it was to order it, it turns out. (Although, to be honest, ordering it wasn't very easy either.)
Here's what happened:
- At 11am, I call Sony for the first time. I could only find their sales number on the web and on the invoice, so I called that. Helpful sales person informed me "I'm only a sales person" and gave me a support number.
- I called the number. Automated system wanted the serial number of the Laptop. No problem. Then it informed me I had to "register" the laptop either over their website or via a 0900 premium number. I tried the website.
- Nowhere could I find how to register.
- Trying the "site map" on the website, I found some register page. I enter my data.
- Then it wants an (optional) "warranty number". I typed in all the numbers I found on the invoice, none of them were accepted. I was worried there might have been some extra document which I'd lost. I searched everywhere. Couldn't find it. So left the field blank.
- Rang the number again, it said the same as it said last time: I wasn't registered, I should either go to the website or use the premium number.
- Tried to register again.
- This time I saw some link "open a support issue online". I tried that. It informed me I was not registered. It had a big friendly "register" button. Clicking it went to the same registration page I'd been on before, with all my data filled out. So the registration page knew I was registered, but the "open a support issue" page could not continue as I was not.
- So I called the 0900 number. Got through to a human. Very helpful. He also noted I was not "registered" and tried to register me on his computer system. It didn't really work, he noted. But no matter.
- He wanted the guarantee number. He told me where it should be on the invoice (thankfully there was no lost extra document!). But it wasn't there. I jokingly said I could scan the invoice and email it to him. He said no he believed me; we chatted some more.
- He said he needed the guarantee number. He suggested I scan it and send it to him. Once that had been done, courier people would ring me in the afternoon to arrange a pick-up. That call was 17 minutes long.
- I emailed it to him. It was a central email address, but he assured me if I put the case number in the subject line, it would get to him.
- No reply. And no telephone call in the afternoon. And no way to contact him.
- I go online and find some "information on your support case" page. I enter my name, the laptop's serial number and the case number. Click submit. Results page is an advertisement for Windows Vista and nothing else.
- Thinking I must have done something wrong, I go back and do it again, same result.
- I ring the telephone number (the non-premium one). Automated voice asks me for my case number. I enter it. Automated voice informs me that I'm outside the warranty period (presumably the free one) and hangs up on me.
- What to do now? Only option is the 0900 number. I ring it. Wait listening to music. Then it hangs up on me. That was 5 minutes.
- What to do now? Only option is the 0900 number. I ring it. This time get through to someone after about 10 minutes of waiting. (I put it on speakerphone so that my colleagues know I am wasting my time listening to canned music while paying nearly €1/minute, to access the support I paid €150 for).
- Finally I get through to someone. They say that the email with the invoice wouldn't have worked (so I'm glad I rang and didn't just wait for them to ring me). I should fax the invoice to them. She gives me a fax number. I ask her to ring me in about an hour to tell me if she got it or not (as I have no way to contact her). That call was 20 minutes.
- I fax her the document, and write her name on it, to try and maximize the chance she gets it.
- Note that this document, which I am trying to communicate to them, is the invoice. This document comes from them, not me. I am trying to fax them back their own document.
- She calls me back (I was pleased about that). Tells me she got the fax, and that she has to speak to a colleague. I ask her when that will be. Well, she says, she posted a post-it note on her monitor.
- I ask her to call me back tomorrow in any case, even if there is no progress. I have no way to contact them, and if they just don't contact me, I'm stuck.
- She freely admitted that, until the warranty issue was resolved, I had no option than to use the 0900 phone number. The other cheaper number was only for laptops "in warranty". The fact that the laptop is in warranty, the only way in which it's not is that their system seems to have forgotten it, and that is something which is neither my fault nor over which I have any control, seemed not to alter the situation much in her opinion.
- She said she'd try to call me tomorrow. I said "what do you mean try?". She said she can try to contact me if I'm available. I said I'm available. OK, she said, then she'll try.
- I have spent 42 minutes on a premium phone line, costing €0,71/minute, that's €29.82.
- I spent €150 on a warranty (that was a while ago)
- I spent over €2000 on the laptop in the first place (that was a while ago)
- The laptop is still in my possession, broken.
- One day has passed, so the duration of me not being able to use my laptop has, thanks to Sony, been increased by (at least) one day.
Sony is not doing well especially in the laptop market. They must be wondering why. I bet the top management don't know that this sort of thing is going on. It's just because their company, apart from producing great products, is just completely broken.
Let's see what happens tomorrow...
I am really sorry for you.
good luck, and keep me updated, it sounds like you re having fun..
cheers, manu
Back in Vienna
I spent a lovely 3 weeks with my girlfriend, doing work at her flat during the day.
Working at home in a different timezone has its advantages and disadvantages, but for developing really complex modules without getting distracted, and upon reflecting upon what one's doing, it's really great.
My laptop bag's really rubbish. It has a habit of falling over in unexpected ways. (But I bought it for about €150 at Harrods so I expected it to be really good. I suppose one should never buy the cheapest of a set of products, even if the cheapest is still really expensive.)
By falling over in unexpected ways, I mean it's like one of those objects one studies in Applied Mathematics like Weebles which just don't do what one expects. So I placed it propped against a wall on a shelf at 30 degrees to the vertical, and it promptly fell over in the other direction, i.e. rotated 120 degrees including going through being vertical. But the shelf not being the floor meant the bag then fell off the shelf onto the floor, dropping about 50cm.
Laptop was completely unscratched. Anyone who knows me know how much care I take of my laptop, and despite being over 15 months old, and having been taken daily to all manner of different offices, having been in Thailand, Italy, Macau, Hong Kong, China, Dubai, Doha, UK and used as my main computer more or less every day of those 15 months, still gets comments that it looks essentially new and unused. So you can imagine my joy at seeing the fact that fall didn't scratch it.
However, the laptop now no longer works. I would say, from the sound it makes (or rather doesn't make) that the hard disk isn't spinning up.
Amazingly I actually have a warranty for the computer. I took more than the minimum duration, and choose the more expensive they-collect as opposed to the default you-send-to-base; this can only be a sign of my feeling of decadence at the time I bought the laptop (the fact I opted for a €100 upgrade of blue reflective paint on the lid, which one can hardly see, and bought the matching leather Vaio bag, which I never use, being others).
And my Chello internet at home doesn't work. I think the cable modem's broken. It wouldn't be the first time (nor probably the last). So despite having two computers (home mac and laptop) and two internet connections (Chello and UTMS for the notebook) with the explicit goal of never being without a computer and internet, I'm writing this from an internet cafe.
how long are you staying in vienna this time? i'll be back home on july 7th for 10 days or so - let's have a few drinks or go to a movie or so!
cheers,
balazs
Paper jams
Why does my printer always assert it has a paper jam? Why do other (personal) printers actually have paper jams the entire time?
Most cheap lasers, and now cheap inkjets (the one I have at home in Macau) seem not to be able to handle paper correctly. More expensive lasers (like at offices) and more expensive inkjets (the one I have at home in Vienna) seem not to have this problem.
In fact with the ink jet printers, I must observe that the printers are from the same manufacturer and are essentially the same printer (this was not by accident). The difference being the design isn't as nice on the cheap one, it feels cheaper when you open the lid, and it has a single digit LCD display, whereas the expensive one has a colour pixel LCD display which has error messages in a language of the user's choice. But the print quality is the same (according to the specifications and in reality). And the software one installs on ones PC is the same.
The paper jam isn't even really a paper jam. After printing about 1 or 2 sheets, it claims to have a paper jam (although everything is physically fine), and instructs you to press the "ok" button. Once the "ok" button is pressed, it continues printing. I mean this paper jam is essentially a software paper jam:
Surely tractor feeds are a better solution? Why has the world chosen to have paper jams instead?
It's really, really annoying.
Email Boxes need to be stored in DB, but also call IMAP, APIs, etc.
I find myself often modelling the situation that there are rows in the database (e.g. "email boxes" for a user), and these rows represent things that exist elsewhere as well (e.g. IMAP accounts to back up these email boxes). There can be multiple ways of accessing these external resources, e.g. to delete an email box one does an deletes files on some server, to find out how much space is used there is an http-based protocol. And in the case of creation and deletion (and changing of password) these operations should not be done synchronously from the web interface, but are queued. This is not a contrived example, I am programming exactly this right now. All of the above are givens.
To not just stuff all the various API clients and other functionality into one huge class, there needs to be different objects representing:
- The "email box" row in the database (and a persistence mechanism)
- A "filesystem" object to represent operations on the filesystem such as "delete email box". This object knows the directory layout used. This object can be shared between other objects which need to perform filesystem operations, such as a filestore accessible via FTP accounts (in this case). It's convenient to program all these filesystem operations in one object.
- A client for the HTTP-based protocol, to find out the box's used size. In this case the protocol can do other functions, such as finding the space used in the FTP filestore. Again, it's convenient to put all these operations in one class: one can create private methods to connect to the server, or for common API requirements such as response parsing which will be the same for all the commands, etc.
- Persistable Queue objects, and QueueProcessor objects representing the programs or tasks to change the password, create/delete the boxes, etc.
- Some Facade object to simply access to all the above?
- When one asks the HTTP protocol client object to find out the space used for a box, should one pass the parameter (of which box) as a Box object, or the name of the box and password as a String?
- Should an application program (e.g. web interface) instanciate and use the HTTP protocol client object directly, to find the space used? Or should it call a method on the Box object, which calls the HTTP protocol object? Should both possibilities be available?
However, I have found time and time again that the following solution works best:
- Not have multiple ways of performing the same action.
- Have a main "Box" object, which acts as a Facade. This represents a particular box. (i.e. not a BoxService stateless facade object, which each time takes a BoxId as a parameter to every function.)
- Optionally have other objects to delegate to, concerning persistence of the box and its attributes to the database (although I prefer not)
- A Box object knows the life cycle of a Box, and knows when to write things to queues etc. This will also need to be exposed in its interface (e.g. addCreationRequestToQueue) and explained in the class Javadoc. If this lifecycle changes (e.g. queue introduced for a certain operation) the interface will change and clients will have to be updated. But that's OK, as probably there will be a requirement in the front-end to display "performing..." as long as the operation is in the queue. So lots will have to change if you change the life cycle.
- This object also knows how to perform the operations which are normally queued, e.g. "delete", in terms of simply calling the "filesystem" object. It may also need to update some internal flags to note that the filesystem no longer exists. These methods are normally only called from QueueProcessor objects, but are also handy to call from JUnit test scripts (e.g. in case of "create"), to put the system in some state that is necessary for further tests. The QueueProcessor does not do much, apart from just call the methods on the Box to perform the operation.
- Applications call Box for all its requests and never call Filesystem. That way if the implementation changes (no longer direct "rm" but now over the HTTP API) the application does not need to change (note that such changes are ones which do not affect the life cycle of the Box, or introduce extra states such as "in queue but not done yet"). But more importantly I just think it's a lot more readable to say "Box b = getBox(); b.getUsedSizeBytes(); b.deleteFromFilesystem()".
- The individual objects such as the "filesystem" object take Strings not Boxes as parameters. This makes those classes marginally simpler. More importantly one doesn't feel right when there's a two-way dependency, i.e. Box needs Filesystem (to call it to implement "delete" methods) and Filesystem needs Box (in its method signatures). And the only place that the Filesystem is going to be called is from Box instance methods, and the Box has all the information such as username, password, and any other information, within its instance variables.
Macau in June
I have just arrived at Hong Kong airport. I will be in Macau (working from home) until around the 23rd June, when I shall be back working in Vienna.
I am contactable via my normal Austrian telephone number (I can get voicemail), gmail email address, skype, and so on.
When is a software project done?
A software project is defined, for the purposes of this blog entry, as a set of people working to produce a new software system, or to modify an existing software system.
The result (exit condition) of a software project is a set of artifacts and other assertions:
- Document (or wiki etc) describing what the software should do, i.e. requirements. This will include subtle details, about what the system does, that will not immediately be obvious by looking at the front-end, or reading software design documentation. This should be a complete description, which is useful for the future, not just a "delta" from the last version.
- Software architecture documentation, in words. Simply looking at 1,000 Javadocs will not enable a new team member to understand the system. Documentation should also include which other options were evaluated and not chosen, and why, to avoid future teams considering the same things.
- (Obviously) the source code for the software. Including the front-end, back-end, any HTML, etc.
- Unit test scripts for all back-end classes needing them.
- Front-end tests. Either a document (simple statements such as "Click on Submit without enough money on account. See error message"), or configuration of a front-end testing program.
- Performance tests done and the software to perform them, if appropriate.
- Configuration (or creation) of a monitoring system to monitor the system once it's live, if it's a service (e.g. web site).
- Administration system for customer care, if it's a service.
- Management reporting. Especially just after a system goes live, management are always very curious about key statistics, such as number of users, number of items sold etc. That needs to be analyzed in advance and the system in place when the system goes live.
- Class diagrams
- Javadoc to describe the purpose of individual classes and methods (where this is not obvious from the names). For scripting languages: parameter and return types (as this cannot be deduced from the source code).
- If this is not the first version of a system, migration concept including scripts to install software, migrate schema, filesystems containing user data, and anything else.
- System uses appropriately international character set such as UTF-8. (This is not particularly modern, the WinNT team decided to do this in 1988.) Java does this out of the box, but it's more than just the programming language. This includes any database, any data stored in flat-files, any APIs (within the system or to/from external systems), and so on.
- All of the above under version control
- Not only the software installed on a live system, but also the existence of test and staging systems. If one uses the live systems for testing, then, once one's gone live, one has no way to fix bugs in a testing environment. And bugs will happen, and they need to be fixed fast, so one had better have thought of this in advance.
- Bug tracking, or wiki system, or some way that the team is trained and rehearsed in using, to track and assign errors as they occur.
- Understood and tested data backup and recovery process. (What happens if the live DB crashes? Better have thought about recovery before that happens.)
- The team must sleep e.g. 2 days before a release. After a release (bug fixing) is the most stressful time of a project and where the team must be at its most alert (as fixing is time-critical). It's important to sleep beforehand, and not e.g. work 7 days a week then in the evening finally release, then go to bed. (You can be certain that 1 hour after you've gone to sleep the site will be offline due to some problems, and you weren't there to fix them.)
Memorable URLs
One thing I have to say I really like about uboot is that I can always remember the URLs to the various places in my nickpage. I can just type them into an IM conversation and don't even have to click on them to make sure I got them right. (I don't have to go to the nickpage, copy-paste the URL)
- My Macau gallery is called http://adrian.uboot.com
/gallery/macau - My Strassburg gallery is called http://adrian.uboot.com
/gallery/strassburg - My Database blog can be found under http://adrian.uboot.com/blog
/databases
http://adrian.uboot.com/blog/databases is wrong. it should either read http://adrian.uboot.com/blog or http://adrian.uboot.com (e.g. with the blog being the frontpage and the guestbook being a sidebar feature)
gettext is so broken
Working on a PHP project recently, there was the requirement for text localization. The standard way to do this in PHP is to use the standard way to do this in C, which is gettext.
I've worked with various translation systems, including one I built myself for uboot, involving a hierarchy of languages going from most specific to most "international", and with each string having a hierarchical id such as "myprogram.errors.disk-full".
Java Properties files are simple but also work well (simplicity being a positive thing in this case). The lines are key-value pairs, and using a convention such as "myprogram.errors.disk-full" the key is almost as good as if it actually were a key hierarchy. The file is in Latin1 but Unicode characters can be used via an escape syntax, and there are many editors where one can just type Unicode text and which take care of the escaping.
So I was looking forward to using gettext. This format was created by GNU, the creators of GCC (a highly respected program). gettext is itself well respected and authors of systems such as PHP have chosen it as their localization system.
But alas, it is broken in so many ways.
(1) The file format. Whereas Java's file format is to have lines such as "key=value", gettext's ".po" format (where did that extension come from?) has two lines for every string, like
msgstr "value"
(2) Compilation (for performance reasons). I work with scripting languages, where there is no compiler. This can be a good or a bad thing; but independent of that, it is a fact. However the editable ".po" files of gettext have to be converted into binary ".mo" files before they work. Thus I have to introduce a compilation step into my otherwise compilation-free edit-and-that's-it test environment.
In fact I don't understand this compilation requirement at all. According to the gettext manual, gettext was developed in 1994. Surely computers were fast enough back then to parse the gettext format, store the whole lot in a hash?
And what I further don't understand is how/if GNU programs were localized before then. I suppose they just weren't.
(3) What about Unicode? I have no idea how to introduce Unicode characters into the editable ".po" files of gettext. The manual doesn't help me. Supporting only 8-bit characters, and assuming/hoping that the encoding of the ".po" file is the same as the encoding that the user is using in viewing the output of your program, is simply a terrible solution. Microsoft designed Windows NT to use Unicode internally in 1988. Java uses only Unicode since its inception in 1991.
Unbelievably there is a reason given for not using Unicode.
This sounds nice, but I don't like having English-text (or, in our case, German text) as the keys for translation files. If the text is e.g. "Click here for more info" and then the new style guideline for the site becomes "More Information", then you end up having
echo gettext("Click here for more info"); // prints "More Information"
# mypage.po
msgid "Click here for more info"
msgstr "More Information"
I dunno, that's just confusing for me. I'd much rather have a text-neutral key such as "more-info".
Update: This article also shows why you can't use English-langauge text as translation keys.
(5) Referencing usages from the translation file. The "xgettext" utility writes lines such as the following into the ".po" file
msgid "Click here for more info"
I don't in any way like having the source file name and line number in the translation files. In principle it looks like it helps you to find the usage of a particular string, but in fact:
- It is not hard to find all the usages of the key "myprog.error.disk-full". That string is hardly going to appear in a non-translation context by accident. A recursive search will tell you where its usages are.
- What if I change "mypage.php"? (which is pretty likely). For example inserting some lines before line 47. Then the information is not only irrelevant, but in addition wrong.
(6) Parameters. We all need strings such as "The file '$FILE' has been successfully deleted". It seems that the standard way to do this in gettext is to use sprintf-type placeholders (e.g. "%s"). However as soon as you have more than one of those, and you translate the string into French, you'll find you need the parameters the other way around. Oops. That didn't work. So gettext is only suitable a) for Western European languages (due to character set constraints) and b) only for the subset of those languages which have grammars where placeholders will be needed in the same order.
The first thing I did was write a wrapper around gettext to accept $0, $1 style parameters, so one could swap their order on a per-translated-string basis. (Although $FILE named parameters might have been better; but that would have made the calling code longer.)
So nice one, they managed to invent, for the purposes of translation, a system which has a file format more difficult to use than a simple key-value pair, yet offering no advantages. It can't handle Unicode. Good work.
Making progress with introduction of unit tests to Uboot
The old uboot code had, amazingly enough, 21k lines of unit tests. But they were not useful unit tests, as one had to run each program individually, and they each had a bunch of (different) prerequisites, such as account_id 3 existing and having an empty inbox, and so on. And with the older tests, their output would be a bunch of print statements (e.g. insert message; print count of messages), and one would have to compare the printed output with the expected results (which weren't documented anywhere).
I am converting them to PerlUnit (which is a clone of JUnit) so that we can automatically and easily run as many tests as possible before each release. This is an incredibly productive task, as I don't even need to write new unit tests (and think about testing strategy), I'm just converting the lines to a format enabling them to be convenient to run!
So far 3.6k lines in 86 test functions in 33 test classes :)
......................................................................................
Time: 48 wallclock secs ( 8.65 usr 0.56 sys + 0.02 cusr 0.25 csys = 9.48 CPU)
OK (86 tests)
Transfering some hex. Sometimes gets replaced by string "INF". Why?
This was never going to work out. Data transfer interface. Our side in Perl and their side in PHP. Both scripting languages (bad) and not even the same scripting language (incompatible badness).
Over the data transfer interface, we are transferring users. Including a code to enable them to unsubscribe from an email newsletter. The first 7 characters of the code identify the users (digits) and the rest of the code is a hex string containing some security information.
All works great. But some users can't use the code? It turns out on the destination system they have "INF" in the field instead of the code.
It turns out that some of these users have e.g. 1234567 to identify the user, and e.g. 123e1234567 as their hex code. That makes the security code "1234567123e1234567". And that "looks like" a floating point number to Perl. But quite a big one. Almost as big as Infinity in fact, so might as well call it that.
I hardly think the flexibility we "won" through every data instance having its own type based on what its data "looks like" hardly compensates the anger of a segment of our users not being able to unsubscribe from their newsletter, or the extra expense to the company of the time to debug this problem (which was then an urgent problem, as it was only discovered after the system went live, as it only affected 0.6% of our users).
P.S. my solution was to put a space in front of the code, which is taken off by the receiving system, so the data always "looks like" a string. But I wouldn't like to guarantee that what "looks like" a string won't change with the next version of the Perl SOAP client libraries we are using.
http://em.adrom.info/action/click/11177/wok4k840
http://em.adrom.info/action/click/11177/8cg8sogg
http://em.adrom.info/action/click/11177/ooww0okw
no e, but they don't work (redirect to http://www.uboot.com/cgi-bin/unsubscribe-newsletter.fcgi?u=n&n=)
Class names repeating information stated in the package name
Classes in modern programming languages can be arranged in hierarchies, e.g. a perl class might be called "Uboot::Message::Mail" or a Java class "com.uboot.message.Mail".
In some programming languages (e.g. Perl) one always refers to the class by its full name (such as "Uboot::Message::Mail") and never by its leaf name (e.g. "Mail"). For example:
my $mail = Uboot::Message::Mail->new();
print "it's a mail" if ($mail->isa("Uboot::Message::Mail"));
In other langauges (e.g. Java) one almost always refers to classes via their leaf-name, such as:
class MyClass {
if (mail instanceof Mail) System.out.println("it's a mail");
For those languages such as Perl, which require using the class' full path at all times, it's not necessary to repeat information in the leaf name that has been specified already in the path. For example, a class to model an entry in a Uboot address book might be in a directory called "Uboot/ABook" in which case the entry class can be called "Uboot::ABook::Entry".
But in Java, you don't want to have a class called "Entry" because, as soon as the "import" statement scrolls out of sight, you'll not know if your instance, helpfully statically typed to be an "Entry", is an address book entry, a guestbook entry, a blog entry, or any other conceivable type of entry. In that case the class needs to be called something like "com.uboot.abook.ABookEntry".
Class names like "Uboot::ABook::ABookEntry" or "Uboot::Monitoring::MonitoringResult" are (only in langauges such as Perl) needlessly redundant and long.
and use it as Abook.Entry. Best of both worlds...if Entry is trivial enough not to mess up Abook's code too much.
take care
-m-
perl / switch statement: Cool Limitation
Look at the documentation for the Perl switch statement. Look down the bottom at the "limitations" section. Look at the last limitation.
vi
Here I am, programming using "vi" and, as usual, it's annoying me. Why am I using it?
It's just occurred to me, I remember from my childhood, my father would come home from work and complain about "vi".
I wonder if my children will use "vi"?
cheers,
balazs
Scripting languages are only for advanced programmers
Why do people believe scripting languages are suitable for beginners to programming? This may be the way they were designed, but they end up having the opposite effect.
Essentially I think scripting languages are like languages for experts - like when you can play the piano and get a really hard piece to play, to show off how good you are. In programming, if you can already write bug free code using a normal language, but why not try the same thing using a scripting language? If you can do it, it shows your intellectual dexterity and cleverness (although the result would in that case be the same, i.e. a working program, and in the case you failed, it would be a less-working program).
Look at the following: http://at2.php.net/manual/en
I understand that "warning" completely. Essentially this function returns zero to indicate the string occurs at position zero and false to indicate the string does not occur in the text. But they are both the same if compared with the php == operator. And for that reason there is a === operator, called "exactly equals", which also tests the type of the thing being compared. In the case of ===, not (0 === false).
I mean this is not simple stuff. Nor is it abstracting away from the details of programming.
How is this in any way better, or simpler, than:
- C which returns -1 if the item is not found?
- Java which
-
- gives a compile error if you try and compare false to 0,
- and also returns -1 if the item is not found
- A hypothetical language which threw a SubstringNotFoundException, which gave a compile error if not checked or thrown?
Where are the dots?
Where are the "..." on the "Advanced" button of "Display Properties"? This has annoyed me for ages. In fact I'm sure this was the case on Windows 2000, XP, and Windows 98 (but I might be wrong).
And of course on Windows you have to use this button each time you plug your laptop into a new monitor, as it will randomly (not even consistently) choose a resolution for you. And once you've changed that to a higher resolution, it chooses the lowest possible frequency for you, like 60Hz, even though it knows (demonstrable via the "hide modes that this monitor cannot display") that the monitor can support a higher frequency: so you have to change that via the "Advanced" button.

Ah and in researching this post, it seems to be that Windows Vista has got it right! Amazing. A good reason to upgrade!
In many ways, apart from the "...", that panel doesn't seem to have changed that much in the years since Windows XP (or Windows 2000) came out.
p.s. why does the uboot blog comment thing not have an option to remember me?
GUI Programming: Always perform network requests asynchronously
Why does one feel ones so much more in control, when using Firefox, than Internet Explorer?
When you select a slow link in Internet Explorer, the whole program hangs for about 1-2 seconds. Firefox doesn't. Although 1-2 seconds is hardly a large % of ones life, it makes a big difference to the experience one has when using Firefox.
Recently I wrote something similar to an IM client (written in Java). It sits on the Windows tray. You can log in, and open a window where you can do various things. The data stored on a website (communication over XML-RPC).
MSN Messenger has one tray-icon for the user being logged out, a different one for the user being logged in, and amazingly (I thought), a third one for during the time the program spends communicating with the servers to log the user on.
In my system, log on is just one single XML-RPC call, with all the necessary data returned in the response. This was a design goal, to never have more than one client-server request to represent a particular user action.
The back-end to this XML-RPC call is a simple perl script which uses a few objects to represent things such as Users. These objects are simple enough, they just make a few SELECTs against our super-fast database. So I thought, as any request to our back-end takes say max 0.2 seconds, I needn't make that asynchronous to the UI of the Java program. And I certainly don't need a separate icon to display during that time!
If I'd ever stated that decision out loud, I would have heard myself saying it, and realized what a nonsense that is.
- While it may only take 0.2 seconds on the server, there's latency to consider, i.e. the time for the packets to flow from the client to the server and back again.
- One can't take into account how slow the user's network connection might be.
- There may well be more than one request from client to server, multiplying the latency. Just because there is one XML-RPC request doesn't mean there are no other requests going on underneath, for example DNS lookup of the hostname to connect to.
- If there is a queue of HTTP requests in Apache, waiting for the FCGI to answer the XML-RPC request, then the time the HTTP request to wait in the queue will also be added to the duration of the call perceived by the user.
- What if there is a server-problem, and all requests take 2 seconds? A design not tolerant of things going wrong is a bad design.
- Even 0.2 seconds is noticeable in a front-end.
- Programming asynchronously in Java is not difficult. So it need not be avoided.
Lesson - it may sound obvious - but it's still worth stating: In a GUI Program (Windows, Mac OS X, etc.), any user interaction over a network, must be performed asynchronously (i.e. in a thread or in a separate process).
Exceptions: use them
Exceptions have been around for a long time. There's no reason not to use them.
I don't want to ever see code such as this ever again.
We all know what happens with such code:
- Nobody checks the return value
- Especially if half the code is written using exceptions, and half using return values, then definitely nobody will check the return code
- It breaks the linguistics of the language. A function called "getUser" should return a User, but a function called "deleteFile": why does it return "true" or "false"?
- Where are the log statements, stating why this function returned false? What if the function has multiple places to return false? Then (perhaps) the calling function can print the log "function deleteFile returned false" but that doesn't tell you which of the multiple places failed.
- What if you want to programmatically check the return result of the function? If all the errors return the same value, there's no way you can programmatically respond to each error differently (as some might be permenant errors such as the file doesn't exist, some might only be temporary errors such as network currently unavailable.)
- PHP5 has exceptions.
- Java has exceptions.
- C++ has exceptions.
- Perl has "eval" and "die" which are like exceptions.
- Javascript has exceptions.
Colour is good
Traditionally one does business documents, diagrams, and in fact all other things, in black and white. But colour allows one to express more things.
- Headings stand out from the rest of the text much better when they're in colour.
- You can communicate much more in diagrams if you use e.g. blue text for an explanation, red text for method names, and so on.
- You can augment existing diagram notations, e.g. UML, with extra information using colour
- Colour printers are very affordable. Everyone and every office has one. I bought one for €50 recently.
- Nobody uses black and white monitors or laptops any more.
- All electronic formats support colour: bitmap formats like GIF, PDF, Word, emails.
Releasing working code
I spend a lot of my time getting annoyed by errors in other people's software (e.g. Windows). Errors which, when you see them, you wonder how on earth they could have been overlooked. But recently I released of a piece of software which contained a major bug (it was only a small mistake, but the consequences were big).
So I set about thinking, what sequences of actions lead, in my experience, to software which works? A lot of these are obvious, yet I've found myself often enough not following them, due to time or pressure reasons. And the result: is stuff which doesn't work.
(1) Unit test scripts: Make them easy to run. For one product I work on a lot, there's are a whole bunch of test scripts, testing all sorts of classes. In fact there are over 21k lines of unit tests! This is a good thing. But sometimes the person running them has to compare the value printed by the program with the expected value (i.e. has to know the expected value, not easy 2 years after the program was written). And not all classes are tested at all. But there are still a good few which do good tests and print "ok" if the result is correct. This is good, but it's so much work to run them all. The solution is to chain them altogether, as is easy to do with JUnit, and create one simple command or click to test them all. If it's simple and convenient and creates value, people will do it.
Also, having a framework into which to put tests - for example, having a convention that a class called "X" has a test class called "XTest", and that methods like "operationY" on "X" have a method "testOperationY" in "XTest" - encourages people not to be scared to write tests. (But forcing people to write tests, e.g. one test per method, is a waste of time. Not every method needs a test.)
(2) Know what the important features are. Most websites really have many many features. It's impossible to test them all, without restricting oneself to 6 month release cycles and 1 month test phases. But there are usually a bunch of features would would be show-stoppers if they didn't work. Can a new user register? Can they upload a photo? (For a photo website). Can they send an SMS (For an SMS website). Write these show-stopping features down. Before the release, go through and test them on the pre-production server. After the release, test them again on the live system. Writing them down helps one not to forget the ones one can't be bothered to test.
It doesn't matter if this list is long. Maybe there really are a ton of features which simply cannot not work. Then you'd better have tested them all.
(3) For unimportant operations, ignore failure. Recently I wrote a program which writes a ZIP file. As a small extra feature, if a file in the ZIP hasn't changed since the last time the program ran, the timestamp of the file in the new ZIP file should be the same as in the old one. This isn't a very important feature, but it's there. Once, when it ran, there was a file I/O problem reading the old file, and the program aborted. But this isn't an important enough feature to abort execution: the program should have continued, and just given all files in the new ZIP a new timestamp.
Consider this when writing all code: if this goes wrong, does it matter? If not, when something goes wrong (any Throwable), log the exception and continue. You'll kick yourself if failure of one part doesn't matter, yet it brings down the whole program.
(4) Restart everything. You've only change one small piece of code, why incur the cost of restarting all Apaches and all robots? Well, software's strange, and any change, however localized, can break any functionality. Any programmer knows this to be true. If you don't restart everything, how will you everything still works? How will you test it?
(5) Look at the log files after releasing. Even if, after a restart of the live servers, everything seems fine, what are the users seeing? They're testing different paths than you. If you log uncaught Exceptions, take a look at the log file before the release, and again after the restart after the release, and see if there are more errors. For example, SQL errors which weren't there beforehand. This could alert you to a problem you've overlooked.
(6) Static checks are good. Programming languages such as Smalltalk and LISP popularized the notion that it's cool to do everything, such as method lookup, at runtime. "It's gives you more flexibility." While this is certainly true, there are a lot of errors which you'll then only find at runtime. (The same is true of SQL strings in program code: You will only know if you've misspelled a column name in the SQL when you run the particular piece of code.) This is not helpful to minimize your errors. I appreciate that taking code online which hasn't even been run once is hardly a good idea, but I've seen it happen often enough.
Java and Hibernate are a good combination in this respect. If the Java program compiles then you know you've got all your variable names, function names, type-casts and Exception checking right. If the Hibernate program starts then you know the classes map to existing tables correctly. (But HQL, represented as strings within your program, are bad again, as you could make a spelling mistake, and it will only cause an exception when the particular code is executed.)
If one has to have SQL strings in the program, and thus an error in it will only be detected once the code path is executed, maybe a prepare of the statement can be placed in a static constructor of the class? That way at least when the class is loaded (at the start of the program's execution, most likely) one will find out about the problem.
(7) Be aggressive about cleaning old code. The more code there is, the more complex a system is to understand. If one has a new chat system, why is code which communicates with the old chat system still there? What if that code relies on classes which you're about to change? What if it communicates with an old chat server and the results aren't displayed anywhere any more, and then that old chat server goes away? The motto "clean code that works" does not involve having 100k lines of old junk around, which no one understands, no one wants to take the time to learn (as it's no longer relevant), and will break randomly.
(8) Compile all of the program. It's obvious, before one releases a Java program, one does a "clean all" then a compile, just to check that one hasn't changed a class and forgotten to recompile a client of it, which will result in a runtime error, e.g. a MethodNotFoundException. Why doesn't one do the same in scripting languages? Admittedly scripting language compilers don't check as much, but they still check some things (e.g. syntactic correctness). One "unit test" should be to go through every program file - every library, every CGI, every PHP page, and do a compile check on it.
(9) Release emails. It's easier to delete an email which one's not interested in, than to find out information from an email one didn't receive and doesn't know was ever sent. If a service like a website suddenly breaks, it's important to fix it as soon as possible, and that probably means contacting the person who caused it to break. Before (not after) a release, write an email to all concerned - operations engineers, software developers, support agents, managers - and let them know that a change is going live.
(10) Be contactable. There's nothing worse, for creating a perception of negativity, than when someone's made an error, and you can't contact them. Make sure mobiles are on loud. If you're not reading email for some reason, make sure you've told everyone in advance, and set up an auto-responder. Want to be contacted less? Make fewer errors.
(11) Monitoring. For each robot and front-end program, one needs to define what acceptable conditions are and what not. E.g. what logs must be written by the correctly-running program, and which logs must not be written. Monitor them. This takes quite a lot of effort, a) the monitoring software b) defining what are acceptable and unacceptable conditions c) tuning the software to actually produce logs which are usefully monitorable. But it's necessary. If it's not done, there will be errors written to the logs and nobody will see them.
Mozilla Thunderbird sucks
Really, Thunderbird is a terrible mail client. I'd been using Outlook for about 5 years when I first tried it, so I thought maybe the reason I didn't like it was simply because it was different, in which case I should continue to use it to get used to it. One year on I still hate it and recently it just ate half my mail. So I'm going back to Outlook.
While downloading a large message using POP over a slow connection recently, the download bar (slowly progressing from 0% to about 50% at the time of the crash) simply went away (without error). Clicking "Get mail" button again did nothing (without error). Restarting the program showed the "Inbox" to be blank for a very long time, but it seemed to be doing something, and after about 1-2 minutes the list of messages appeared. But only the mails received between the time I started using Thunderbird and about mid 2006-10 were there. Mails from mid 2006-10 to now (mid 2007-04) are just gone. So that'll be the mailbox corrupted then. Imagine you relied on Thunderbird as the only storage place for all your mail. Well, thankfully I don't. And thankfully I won't even be using Thunderbird for one of the storage places for my mail in the future.
Here are the reasons I didn't like Thunderbird from the beginning.
- When you click "reply", the cursor inviting you to type a response to the quoted mail is at the bottom of the mail, not the top. It turns out there is a preferences option where you can change that, but it took me about 6 months to find it.
- The HTML mail composer sucks. You have the cursor blinking away somewhere, press a key expecting the character to be inserted where the cursor is, but no. The cursor suddenly moves somewhere different (e.g. a line down) and inserts the character there.
- If you send a rich text message, it asks you "do you want to send this mail as plain text (recommended), html, or both?". Text is rarely so long that the bandwidth required for a multipart/alternative would be a problem. And multipart/alternative is there so you, as the sender, don't have to know what formats the recipient can read. So this dialog box is just broken. Also: why is plain text recommended, do we want to be stuck in the 70s forever? Let's all go to the disco and send (recommended) plain text emails using Firefox.
- In Outlook, if you click "send" and you are offline, the message is stored locally temporarily. As soon as a connection is available, it is sent. With Thunderbird, however, the situation is more complex. At the time of sending, you have to select "send" (which yields an error if you are offline), or "send later" (which is available when you are online, even though you'd never want it). When you go online you have to select "send emails now", as opposed to that happening automatically. However, I thought I could make this all go away when I found the option "if you go online, Thunderbird can send offline emails immediately". I clicked that but it didn't work. It turned out "go online" referred to the Thunderbird menu options "go online". If, every time I connected to the internet, I had to go through each application and use its menu option "go online", well, that would be a bad situation. Probably why other applications don't work like that.
- Search results are unsorted. Search happens in the background (good) and adds mails to the search results window as it continues and finds them. If you click on a column heading in the results, e.g. "date", to sort the (initially unsorted) search results, then during search (as more emails are found) they are simply added to the bottom of the search results. So you have to click the column heading again, to do a sort including the newly found emails.
- The UI to do search is terrible. If you open the drop-down with the keyboard, allowing you to select "sender", "recipient" (i.e. which field must match in the search), use the cursor keys to select the field you want, then press tab to move to the text field (to type the value of the field which much match, which works in other applications), the drop-down list of fields closes, but the field you had selected is forgotten.
- Full-text search takes ages. No indexing. Why?
- If you are composing an email, and want to send it to someone whose address you've forgotten, you can go to another window, find a mail from them, right click their address and say "add to address book". Go back to your compose window and try and use the address book: it doesn't contain the new entry. You have to close the compose window, open a new open, copy/paste the entire body and all other recipients over, then the new window knows about the current address book.
- Emails you send using the HTML editor are in Times (not Helvetica/Arial as in Outlook), which makes ones emails look terrible, and also marks one out as a person using "strange" non-Outlook technology, to all ones recipients.
regarding your lost emails, that's probably an issue with your mailbox having gotten too large. see http://kb.mozillazine.org/Thunderbird_:_Tips_:_Compacting_Folders a problem in itself.
thunderbird 2 is about to be released, i'll give it a try. just one more time.
I did used to like mutt tho...
regarding point 1: where's that preference option? i only found "forward messages inline / as attachment". i NEED that option, it would solve my major headaches with thunderbird..
regarding point 3: you can turn that off (options/composition/general/send options)
i just installed thunderbird 2.0 RC1 and it looks better. there's a "send unsent messages when going online yes/no/ask me" option (don't know if it works).
Don't think that task tracking numbers are a replacement for documentation
Do you think this is an appropriate and sufficient documentation for this function?
#This number refers to the task number in a task/bug tracking system. The idea being, why write documentation, when that would simply duplicate what is already available.
# 3978
#
sub is_contact_in_abook_for_user {
There are a number of reasons why this sort of documentation is bad, but the main one is that a feature lives on a for a long time, as does reusable code which one creates in order to implement the particular feature. But a task is just a task, once it's done, no one cares about it any more. So if one sees e.g. a class modelling a user exposing methods such "fetch by user name", "fetch by id", "fetch by telephone number", it isn't really helpful to know that the first two were implemented as part of a "implement payment" feature, and the last for a "mobile phone shop" feature. There are so many aspects of those features which have nothing to do with a function which can be reused time and time again.
The other reasons are:
- You cannot click "3978" and see the documentation. You have to open a browser window, search for the task, and so on. This is a lot of work, so many people will never do it, meaning they'll alter the code without understanding what it's doing. This may be their fault, as opposed to the fault of the process, but it's a reality nevertheless.
- What about the other documentation for the function? What are the return types? What are the parameter types? It's not documented here and I bet it's not documented as a "comment" to task #3978.
- Task trackers get changed over the life of the program. In this case, the original task tracker was maintained by another company with whom my customer no longer has a relationship. So I have no way whatsoever to find anything about task #3978 or any other. So all those comments littered throughout the program are literally useless.
did i mention that i love trac?
Heathrow
I've just arrived at Heathrow airport, Terminal 1, from Hong Kong.
- The way to the terminal from the aircraft was not clear: then I saw some temporary photocopied sign stuck up with tape that showed the way.
- There is no wireless internet here, in contrast to all the other airports I use: Hong Kong, Dubai and Vienna.
- As I am sitting here, inside in a terminal building, there is a pigeon walking around on the carpet.
- One guy already complained to me surprised that "there is no complimentary telephone service here, in contrast to Hong Kong" (although I've never heard of that before!)
Hopefully T5 is going to change that when they complete it as once it is built they say they are going to cycle through the rest of the terminals and rebuild (or recondition) them all.
Holiday in China (Yangshuo)
Christina and I went to China recently for a few days holiday. We had a great hotel room with a huge balcony. Here is a picture of us on some boating expedition!

Windows Update / Restart
This really is the worst feature in the world.
Windows XP downloads updates for you automatically, then installs them, then asserts your computer has to be restarted. You can click "restart later" but the assertion is simply repeated later. That's quite annoying, but I suppose one can get used to it.
But the most outrageous thing is that if you don't click "restart later" in time, it restarts your computer for you. If you take a short break for the computer - go for a coffee - and come back, you find your computer has no open windows. What about all those websites and documents and other things one was working on? Gone.
Windows really is so bad, one shouldn't use it.
UI Consistency from Microsoft
It's amazing how they can't even get the simplest things of usability right (in this case: usability through consistency).
During Windows Updates, an icon appears on the icon tray. If you click it, you get this window.

How do you get it back on the task tray?
- The minimize button simply does nothing. When you click on it, the button shows its "pressed" icon, as if it would do something, but when you release the mouse, the button returns to its normal state, and nothing has happened.
- The close button, does not close the window in either the sense of a) stopping the process described in the window b) making the window go away altogether, but instead does indeed minimize the window to the task tray.
ohne titel
In an episode of MacGyver I watched recently, the special agent did some handshaking and smalltalk with someone concerning the goals and time parameters of the mission in hand, and then asked "who can I talk specifics to?".
Normally, in business, one expects that if there is a problem with ones contact partner, it is because they are too junior. "This person can't make the decision, I need to talk to someone more senior".
But that MacGyver line made me realize all too often how one actually would need to say "This person doesn't know what's going on, I need to talk to someone more junior."
"This is a major design decision that will decide whether the platofrm is going to succeed or fail. It's very IMPORTANT. Even though all the technical people with the requisite knowledge to make said decision are in the room, we must escalate it to a mid-to-high level manager with no technical experience or qualifications whatsoever. Because its IMPORTANT and only high level managers can make IMPORTANT decisions."
...6 months later the whole thing is fubar and manager guy is like "Eh? I didn't design that piece of shit. I'm a manager."
Morale: I worked at Deutsche Bank for 4 years.
We don't need these users - let's move them to an "archive" table!
For one of the customers I currently work for, when we first designed the platform in Q1/2000, there was the "account" table, there we stored our users. There were always various pressures to move "inactive" users to a separate "archive" table. I was always against this decision.
In Q4/2005, during a period of my absence, it was decided to implement this decision. A bunch of users were to be deleted, but "not quite", in case we needed their data again. Their data was to be moved from the "account" table to an "account_archive" table.
This was really the worst decision ever made. I said that before, and now I see the consequences. I want anyone who considers such an operation good, to understand the consequences. So I list them here.
- More and more, bosses and business people require we do operations on "all" users, which includes the "account_archive" table. This generally involves a "union" of both tables.
- Now I have to create a real-time data interface to a slave system. This also including archived users. That means I have a "who has changed" table (input queue for the process exporting changed users to the slave system). This table references account_ids, but I can't create an FK from this table to "account", as sometimes an "account_id" references "account" and sometimes "account_archive".
- There are classes which model a User, and this uses the "account" table as the underlying table. This enables me to build logic functions on the User class, and this has been done. However, at the time the class was built, there was only "account", so I can't use this class to model users who are stored in the "account_archive" table. (I'm not going to extend the User object to include the "account_archive" table, that will make this critical code too complex)
- Now I have to allow users to "unsubscribe" from a newsletter, and "archive" people can receive newsletters, if they elected to receive them while they were active. Again, I can't use the User objects to do that. So I have to just program in plain SQL in an fcgi (or create a second class MaybeArchivedUser to model a user which could be in either table, and then duplicate some instance methods - that's what I chose to do).
- It was suggested "maybe we archived the wrong users". But it's nearly impossible to re-create them as the schema is different, and some information has not been kept on in "account_archive". Their nicknames, which are unique amongst active users (but not amongst archive users) might have been reused in the meantime.
- We actually will never need these users again: we delete them
- We might or do need them in the future. In which case we set a special "status" in the "account" table. They can't log in. But we can build User objects. We can re-enable them if necessary. We can even let them log in to some mini-platform where they can do a few things such as delete themselves or request their reactivation.
- It's more probable that two accounts will be read from the same disk block, after a defragmentation has occurred (did any defragmentation run? I don't think so)
- If there are half the number of users, that's one less binary-index level. If there are 2M users that's 21 index branches. If 1M users that's 20 index branches. Hardly a big saving.
- Although backup (and recovery) no doubt became quicker
- This hardly makes up for the other disadvantages
Fast
From our Oracle test instance:
Elapsed: 00:01:17.90
That's 1¼ minutes to insert (and index) over 1¼ million rows.
And this is a very old test instance. I think the hardware was last updated 2-3 years ago.
That's pretty quick!
Mouse reboot
I have been using a trusty wireless mouse for about 3 months now. (I didn't want a wireless mouse, but here in Macau, I didn't know what was going on, so I walked into an expensive hardware store - the only hardware store I knew - and they only had wireless mice. Well I thought, it may be twice the price but even twice the price isn't expensive, and I need a mouse...)
It suddenly stopped working while I was using it.
- The light under the mouse was on, so the mouse thought it was working.
- The touch pad built into the laptop worked, so Windows was still working and accepting pointer-movement instructions.
- I took out the USB device, which communicates with the mouse, and put it back in. "Detecting new hardware" etc. But it didn't start working again.
- I plugged the USB device into a new port. Even more "Detecting new hardware" etc. But it still didn't start working again.
My mouse had crashed, and needed a reboot.
MySQL's "enum" datatype is a good thing
I've often had discussions with people about whether the "enum" type in MySQL is a good thing or not. Basically there are two ways to use your database:
- As an unstructured bunch of "stuff" to store whatever the software needs to persist. Such databases use lots of "blob" data with serialized objects (it's easy to program), tables with multiple functions ("object" table with "type" column), few constraints, and so on.
- As a representation of the data the program is trying to model. Such databases have meaningful column names, two different types of things are two different tables. Adding constraints is easy.
But for some reason I've always been the 2nd type. I like to look at a database and understand what data is being modelled. It creates a certain self-documentation which can often be lacking from complicated software. Constraints can be added which acts like functional assertions (functional in the sense that they involve no state: you say that this value must be between 1 and 10 and then it is that way. You don't have to program any "path" or state to check that).
That an item can be in exactly one a distinct set of states is a fact of life in all types of domain modelling:
- What state is this invoice in? Is it "paid", "open", "paid+processed"?
- Is this item deleted or not? (Or marked temporarily suspended, pending administrator checking the content)
- Is this photo public, or does it belong to a user?
If you program databases like #2 above, like I do, then enumerations really do make the data model richer.
- The database maps to the domain easier (thereby documenting the domain, in case this is not done elsewhere)
- The database maps to a domain-consistent implementation in the programming language easier
- The database implicitly then has a constraint, as you can't set the column to be some value which makes no sense for the domain (and therefore the program)
It can be said that then it makes it "more difficult" to add a new state, as now you have to "change the database as well". But if the model changes, and the database reflects the model, then that's a good thing, not a bad thing. And it isn't even much effort: If you say that invoices can be in a new state, then there'll be a lot of programming work to support that change (UI, billing logic, robots, test scripts). The "alter table" statement is no work at all proportionally, and you'll also be thankful for every extra constraint the database or programming language can offer you (which such a change feels very scary in a Perl program, but not so much in a Java program).
Windows path length limit
It really seems that Windows does indeed have a path limit.
While checking some files into a subversion repository:
- The repository was D:\Adrian\my-respository
- Within the repository I had quite a deep directory structure, to access this particular project
- Within this particular project, the IDE I was using had a few levels of directories, to include "work/src" and so on
- The class path of the Java classes was quite deep, "com/company/project" etc
- Subversion itself puts a few levels of dirs in ".svn/text-base" and so on
Cannot create 'New Folder': path invalid
So it seems paths have a limit in Windows. The existing working path was 220 characters long, with 20 directories including the working directory and the hard disk's root directory.
This is all very annoying, as I can't really do anything about any of the above reasons why the path is so long.
http://blogs.msdn.com/bclteam/archive/2007/02/13/long-paths-in-net-part-1-of-3-kim-hamilton.aspx
Beautiful Kettle
One really does appreciate things that work elegantly.
My girlfriend insisted we bought the following kettle, on account of its colour.

It was quite an expensive kettle but it did look so good, I thought well, OK.
But with its expense comes more than just its beauty.
- After the water has boiled, to open the spout, to pour the water, there is an extra plastic bit (looking like an ear) which stays cold. Meaning you don't burn your hands or need to use a dish cloth.
- The purple handle is designed in such a way that it too doesn't get hot.
- You can lift the lid of the kettle off without having to move the handle to any special angle (as was the case with the previous kettle)
- The water pours beautifully. No spillage whatsoever.
- Without water, it's light (ideal for the ladies for whom it was presumably designed).
- It's the right size. There's no point making a kettle holding only one cup of water but most families don't need industrial sized kettles which can cook 34 cups of water either. This kettle is physically small but can boil 3-4 cups of water.
- The spout is big enough that one can poor water in it, without having to lift the lid off, if that's one's preferred method of filling the kettle (it is mine)
What does this error mean?
While moving a folder "old-cvs-data", with many subdirectories, to the Recycle Bin under Windows XP...

Maybe each file that is stored in the recycle bin has a "original path" attribute, with a max length 256 chars, and that stores the original path like "Dir1\Dir2\Dir3\file.txt". Maybe if files are nested too deeply that attribute cannot hold the value. But that's just a guess.
Maybe it really is time to get a Mac.
Concurrency control using Oracle's "Select for Update"
There are times when one need to prevent a certain "critical section" of code from being executed by more than one process on the same object at the same time. For example, if the requirements state that a user cannot have two subscriptions of the same type active at the same time. So to enable a subscription, if one does "a) check if user can have the subscription, b) enable the subscription", one need to make sure that there aren't multiple processes doing that simultaneously.
Oracle (and thus MySQL InnoDB and PostgreSQL) support row-level write locks. This means that (in contrast to other databases) if one database connection, in a transaction, writes to a row (but has not commited yet) then other connections can freely read the row: they get the version which was most recently commited, which was the version before this transaction. Only if a second connection tries to write to the same row, will the connection wait for the first connection to perform a commit or rollback.
The "select for update" statement acts like a "write" above. All rows which are returned from the select to a first connection are not being written to by any other connection, and any other connection which later attempts to write (or "select for update") the rows will wait until the first transaction has commited or rolled back. Locking rows using "select for update" is thus an appropriate mechanism to implement locking.
As "select for update" still can only lock existing rows, one needs to decide what the appropriate row(s) are to lock for a particular operation. Remember that the rows are only locked to other connections using "select for update" or writing to the row: anyone can still read the rows. If one wishes to exclude the possibilty of a user having two subscriptions of the same type simultaneously, one cannot lock the "zero rows" the user must have in the subscriptions table beforehand, as only existing rows can be locked.
It thus makes sense to lock the "user" row representing the user (or some other table containing exactly one row per user).
Therefore, for certain critical operations, such as adding a new subscription line for a user, I shall perform the following operations:
- Open transaction
- "Select account_id from account where account_id=?"
- Does the user already have a subscriptions row of this type currently active?
- If not, create the new row for the user
- Commit
And even if it's just the row being locked (best case), other reads (e.g. a user just wants to view some data) will cause the reading process to block, increasing the number of processes waiting from just those needing to wait due to execution of a critical section, to all processes accessing the table.
Copy/Paste between two rich text editors in web browsers
I don't know on what technology the gmail rich text editor is based, nor the uboot rich text editor for composing blog posts (although the latter I should know!) but despite having different appearances (fonts etc) and existing in different websites, one can copy/paste formatted data (e.g. lists, bold) from one to the other (at least using Firefox). The text takes on the appearance of the editor one copies it into.
I don't know how that works, but I think that's quite impressive!
Bottles more than 100% full
I don't know how they do it, but here in Macau, one can buy e.g. a 1.5L plastic bottle of a non-fizzy drink such as still water, and instead of having a small space of air at the top, when one opens it
- The line of the water is exactly at the top of the bottle, i.e. one couldn't fill it any more
- Before one gets a chance to observe #1, water has spilt out
Some computer jokes
A friend at uni who wanted me to fix her "cd rom" ... I said I'm most unlikely to be able to help you fix some laser- and radioactive-based device, but she insisted. It turned out she hadn't plugged her computer in. I said well why did you think only the cd-rom didn't work? She just said "i call the whole thing the cd-rom!" as if there was nothing wrong with that.
I have found people often say "what virus program should i use?". As opposed to anti-virus program.
A friend spilt apple juice over her keyboard of her notebook and rang up the tech support and they said "what type?" or something presumably refering to the notebook and she said "natural apple juice.". I think she has a Ph.D. in Computer Science now, and had a Masters then.
All of these are in fact not jokes, but real stories!
Floats and Doubles
C has the types "float" and "double" (normally 4- and 8-bytes long). Java has them too (but their length and behaviour is exactly defined in Java). PHP also has "float" and "double". Most people only ever program with doubles.
I always wondered where this concept of having two exactly-sized types came from. Databases allow yo to specify precision and scale to suit your application.
I always assumed C introduced this, and every other language copied it (like many other features, such as for-loop syntax). But maybe it was Fortran?
http://www.php.net/manual/en/language.types.php#43671
Keep your options open
I once saw an episode of "cops" where the police officer kept on telling the offender, "your license has been suspended or revoked!" like that officer was not allowed to modify the standard phrase and inform the offender which of the two events had in fact taken place.
That's a bit like software telling you you have "1 file(s)".
I was forced to think of the above when I saw the following error message after trying to open a file in Word:

Store currency amounts as number of cents, pence, etc.
The way numbers are stored in C and thus in PHP, Perl etc., and also Java, is with a binary system. So if you have 2 it's "10" in binary, 5 is "101" and so on. And the same is true for fractions. So a half is "0.1" and a quarter is "0.01". That means that just like "one third" is not exactly representable in our decimal system (0.3333) numbers like "one hundredth" which are easy to represent in decimal cannot be represented exactly in binary.
Oracle and the MySQL "numeric" type stores data as decimal. Meaning if you store "a third" to two decimal places, they get stored as "0.33". And if you try and add 3 "one thirds" together you get 0.99 not 1.00.
So those Oracle/MySQL data types, using decimal, are good for representing money, as you can exactly store "one hundredth". And adding 100 "one hundredths" gives you 1.0 exactly. However that doesn't help one much, as all programming language in common use today only support a binary floating-point representation - which can't store "one hundredth" exactly.
This is more of a philosophical issue than a piratical one. Because even if one does add 100 "one hundredths" together, one gets a result like 0.9999 and if one tries to print that to two decimal places, then rounding will take place, and "1.00" will be displayed, i.e. the right answer. So it's not really a practical problem.
However, the solution to the problem is easy to implement: just store a whole number of cents, pence, etc. So there's no reason to accept any inaccuracy when it comes to storing monetary amounts.
This was the way my Mother programmed, when she had to deal with pounds, shillings and pence in the old UK pre-decimalization monetary system: she simply stored the number of pence as an integer. If it was good enough for her, it's good enough for us now. After all, software development doesn't change that much over the generations.
"Smith Software Development (Macau) One Person Limited" exists
I went to a meeting today where my company was officially signed and documented into existence.
My congratulations, I wish you all the best!
Best regards from Vienna
Simon
VCS Commit emails are good!
I am currently working on a small project with one or two other people. We don't sit in the same office (or, right now, on the same continent or in the same timezone).
Every time something is committed into our version control system (Subversion), everyone on the team gets an email. This lists:
- The files which have been changed (or added or deleted)
- The commit log message, i.e. written by the programmer, explaining what the commit represents
- A "diff" of all the files which have been changed
I don't know how this would work in a larger team, but for our small team it works perfectly.
My father once told me a story about a boss of some company he had worked at, who read all the faxes which went out and came in to his company. He said that was the only real way to actually find out what was going on in the company. Reading commit logs is similar!
I know a lot of times people omit information because they assume you already know, or they just forget. When I look back at communication failures on projects I've worked on, a number of them wouldn't have occured if a similar system had been in place.
Making changes to a Script
I need to change a bunch of functions in a bunch of classes to take a "user" object as opposed to a "user_id" number.
I am using a scripting language. How am i going to do this? I am going to do it the best i can, then compile, but the compiler is not going to find any problem as they are the same "type" i.e. they are both values.
Then i'm going to get lots of runtime errors. I'll correct them until they all go away. Then there will be no known runtime errors. Which is better than known runtime errors. But both more difficult and much less satisfactory than a simple compile in a language like Java or C++ which would find all occurances of such errors.
Thankfully I have lots of unit test scripts which will help in my unnecessary debugging process. A lot of them I just wrote to test simple things like "do getters work?". In a compiled language many of those tests would be unnecessary as the only errors that could occur, the compiler would pick up.
So scripting languages have caused me more effort:
- To do and test this change
- To write and run and maintain unnecessary unit test scripts
- To write this blog post (which wouldn't have been necessary had points 1+2 not been necessary).
Job advertisement
The following job advert appears today through Wednesday in the Macao Daily News. This is my first ever advertisement in a newspaper reading back to front!

Using UTF-8 and Unicode data with Perl MIME::Lite
MIME::Lite predates Perl 5.8 which supports Unicode and UTF-8. But it's easy to get MIME::Lite to work with Unicode bodies and subjects.
To attach a plain text part to a message, with a string which contains unicode characters, use:
$msg->attach(
Type => 'text/plain; charset=UTF-8',
Data => encode("utf8", $utf8string),
);
To set the subject of a mail from a string containing unicode characters, use:
use MIME::Base64;
my $msg = MIME::Lite->new(
...
Subject => "=?UTF-8?B?" .
encode_base64(encode("utf8", $subj), "") . "?=",
...
);
Note that the above methods also work even if the strings do not contain unicode characters, or do not have the UTF-8 bit set.
It would be better to change MIME::Lite such that subject and data strings are accepted and the above code happens inside MIME::Lite. I've filed a bug report.
Starting a Software Development company
For those of you who don't know, I'm now living in Macau as well as Vienna.
I always wanted to start a software development company, but hiring people in Europe is expensive. And now I've found a country with a lower cost of living and salary expectations than Europe, and that not only permits international business but actually encourages it. This country is Macau (or Macao).
The starting of the company is underway. I offer good quality outsourced software development, at a good price. For large or small projects. So if you need to increase speed in your existing software development projects, or develop an entirely new system at a good price, then don't hesitate to contact me for a quote.
http://www.smith-software-development-macau.com/
CHAR vs VARCHAR (and VARCHAR2)
A friend just asked me:
I have a DB, informix actually but I think its unimportant. A column
is a char100. I have a string of text in a row in that col. the
string is 4 characters long.
When I select the char100 column I get a space padded string of 100
characters with my string at the front. Have you ever seen this?
Yep that's normal. A char(100) column is exactly that: 100 characters, no more, no less. So if you put too few characters in the field, rather than giving an error, the db pads the value with spaces.
The "very advanced" datatype varchar (or varchar2 in oracle) remembers how many characters you put in the field. (I shudder to think what oracle's varchar datatype must have been like. These days it's an alias to varchar2).
There are basically no reasons to use a char. We used e.g. char(2) to store a user's country in uboot ("at", "de" etc) but in fact we even had some problems with that, so we decided to always use varchar thenceforth.
In MySQL, if you define a table where it can pre-compute the byte-width of a row in advance, i.e. it's only composed of chars, ints, etc, no varchars, then if you delete rows and then re-insert rows, you never lose any space. As it can slot new rows exactly into the space taken up by old rows.
But if you use any varchars, then it can't. And in that case, it states there is no advantage to having any chars. So if you have a table with some chars and some varchars, then it converts those chars to varchars.
Incidentally, my book on Oracle (which serves also as an advertisement for Oracle) says that Informix, if you do an insert, then the page where the row gets written gets locked for the duration of the transaction, i.e. page lock not row lock. So if you do another insert from another transaction, even though it's an independent row, it's probably going to be written to the same page, so it must wait on the first transaction. So there's a lot of waiting going on. Not being able to do two inserts on the same table from two different transactions simultaneously would seem to be exeedingly rubbish. So I'm curious if it's really true or just Oracle marketing. Or maybe it was the case in Informix 1.0. But let's not forget that Oracle 1.0 probably didn't have varchar2.
Update: my friend writes: spoke to our informix guy, he says that was true in informix 7.10
from '97.
Database error messages
Database error messages in general are very bad. Why?
Oracle Version 8 had lots of messages such as "Invalid column name" where they meant:
- Column name not found in the table in question. The word "invalid" is the wrong word as it implies illegal characters or something like that.
- Which column? Which table? The parser surely knows this at the time it generates the error message. But it helpfully chooses not to inform the user.
But I have the following problem with MySQL. I try to create a table with InnoDB with a foreign key constraint and it says:
ERROR 1005 (HY000): Can't create table './myschema/mytable.frm' (errno: 150)
What it means is: the statement has an error in it. But what is the error? In this case, there was a foreign key constraint and the column didn't exist in the referenced table. But why couldn't it tell me this?
How MySQL reduces error messages in your program
Ah MySQL (at least MyISAM) so isn't a real database!
Firstly, when doing an insert, I did some arithmetic. The numeric column was of a certain width. If the result of the arithmetic is larger than the maximum allowed value my number was just getting turned into that maximum allowed value, without warning or error. A large number suddenly becoming some other large number may sound good in the philosophy of "errors are bad - we want to minimize errors!" but literally it's never what you want. Oracle gives an error if a number is too big to be stored in a column. Which is what you always want.
Secondly, due to above arithmetic overflow errors, my insert statement was failing (as multiple values that should have been distinct, but beyond the maximum, were then identical, equal to the maximum). I kept on doing it and it kept on failing. Then I looked at the table and each time I'd done such an unsuccessful insert (a single statement to insert maybe 10k rows) some rows (but not all - due to the error) were getting inserted. Having half a statement succeed is never what you want! Oracle sets an invisible checkpoint before each statement and if the statement fails, rolls the database back to that checkpoint. That's always what you want!
> Wenn MySQL einen Wert in einer numerischen Spalte speichern soll, der außerhalb des für den Datentyp zulässigen Bereichs liegt, dann hängt das Verhalten von MySQL von dem zum betreffenden Zeitpunkt aktiven SQL-Modus ab. Wenn etwa keine restriktiven Modi aktiviert sind, setzt MySQL den Wert auf den jeweiligen Endwert des Bereichs und speichert dann diesen Wert. Wenn als Modus jedoch TRADITIONAL aktiviert ist, weist MySQL einen Wert außerhalb des Bereichs mit einer Fehlermeldung ab.
The SUM(col) of zero rows is
This just annoys me so much. The sum of an empty set of integers is zero, not undefined.
However neither Oracle nor MySQL understand this. I can only assume this variation from common sense and mathematics is considered the "best practices" definition of the SQL SUM function.
mysql> desc email_box;
| box_size_bytes | int(10) |
mysql> select count(*) from email_box;
| 0 |
mysql> select sum(box_size_bytes) from email_box;
| NULL |
gmail errors using T-Mobile UMTS card
I started to get errors when using gmail. Gmail is normally very reliable.
I would enter the URL in Firefox, the white background with "Loading..." top-left would appear, then a little later "This appears to be taking longer than normal" would appear, and that was it.
It turned out, I only got the errors when using a T-Mobile UMTS card (in Austria). This has an option to "compress images" which is on by default. This means that JPEGs look nastier (but load faster) than normal. This was somehow corrupting gmail! Go to the URL "1.2.3.4" to change this. Even once you've changed it, it "forgets" you've changed it, so you've got to do it once a session. And the browser can cache the downgraded images so you have to clear the browser cache sometimes as well.
All seems well using Internet Explorer, and all seems well using the "basic HTML version".
bugs
well really a lot of things were far from optimal about the software. lots of bugs but a lot of things which were integration troubles, i.e. one bit of software worked 95% and another software worked 95% and together they worked 0%. today and yesterday sat with smo and went through a whole bunch of software from a whole bunch of people and just hacked away until it worked. now there are last 5 galleries etc on the start page which is quite cool.
new galleries is not as cool as it could be, as they are "checked", i.e. when the customer care agents go home then there is no new checked content. but the newest blogs are interesting as they are not checked.
surprisingly people tend to use the new blogs much as the old galleries, i.e. just lots and lots of photos. surely the gallery is the more appropriate forum for such content. but hey, if the users like it, that's all i care about.
ohne titel
well the migration was good fun. i got in at 3pm and left at 8pm the next day. a few problems during the afternoon but the night was good.
i had about 10 different areas prepared, mainly SQL, a few things to do with the filesystem for which i had to write perl code for. obviously SQL is better due to its declarative nature and it being faster.
amazing speed from the oracle. for the folder migration statement, i inserted 3M rows into the new table, and those rows were from a select which involved 4 tables, each of which had millions of rows each. took 24 minutes... not bad eh.
that's the thing that's cool with uboot, i mean if you make a mistake you've lost hours of the migration time, so you've really got to do it right, and have everything planned in advance. i did indeed make one mistake which cost us about 4 hours of offline time. but i suppose 1 mistake is ok. maybe.
wow, yesyesyes mtbf of 29h should be okay :-) but reducing your out of office time to 0 doesn't sound like a good idea :-(
congratulation for the good migration job... (but no comment on the current state of the page...)
greetings
michi
ps hm...yep... visit possible - but when? lunch will not be possible but maybe on evening. - at the moment i'am working from 8-17h @ siemensstraße (the very other side of vienna) which days, and how long are u at the uboot office?