it’s alive!

Kongzi is working.

I have made a dozen or more small changes, mainly adding some great new features in the last month. But the important thing is, it’s working. I have 40 teaching hours a week now, and I use Kongzi in the classroom to teach English. The students LOVE it. And the schools LOVE the fact that I can give audiovisual teaching. Time to ask for a raise? Maybe ;-)

The downside to all this is that I don’t have time to really work on the program. I still haven’t finished the memory game or the new idea I had, the unscrambling game. But other than that the program is very mature now. As I continue to use it over the next months, it’s my hope that weekend here weekend there I can finish it off and then step 3: profit.

I know I’ve been rambling about releasing this program for years but perhaps I lacked confidence in the code. Now that I know it works, and works extremely well, I will have a selling point to work from.

Actually the way these things work there would probably be more money in selling it as a tool to teach English. But hey, either way, it works like a charm :) I hope I can find the time to finish it soon. Please leave supporting comments maybe I just need a pep talk to hunker down and start coding again. Right now I just feel tired.

Chinese Character Frequency Lists

Hi. This is in response to a comment left on the Kongzi page.

Calimeron writes:

Hi, I’m using the Yahoo Widget “Mandarin Flashier Cards.” That uses a cedict dictionary with 2 frequencies (in sqlite format). Do you know anything about these frequencies? Or other frequencies available on the web? (I don’t like to use these frequencies if after learning 1000’s of words, I discover the frequencies are not right.) Tip for you: do something similar! If you have a program that starts automatically at the startup of the computer, and then automatically flashes the 5-10000 cards you’d like to study, it’s very difficult not to study!

Hi Calimeron! I have been interested in frequency lists for a long time. The conclusion I’ve come up with is that no one frequency list is the ultimate, be all and end all. The reason for that is because frequency lists merely target whatever they have been analyzed from. Let me give you an example. If I took, say, 200 daily newspapers, 100 novels and business books, and a broad selection of 100 magazines all from last year, threw in a few scripts from plays and transcribed news broadcasts, I’d have several million analyzed words. The Chinese Government put something out like that. It lists frequency of simplified chartacters. That’s one list. But if I added a lot of classical chinese or religious books like the Dao De Jing, the frequency list would change.

Another example is the list put out by the Ministry of Education in Taiwan back in 1997. This is a very modern list with nearly 20 million analyzed words from a broad spectrum of books, magazines, newspapers, television, and even some classical material. This is the frequency order listed in Far East’s 3000 Chinese Character dictionary – a book I cannot reccomend highly enough for it’s value as a learning tool.

Yet another example is the internet frequency order. Someone had compiled something like 5,000 websites from China and came up with 280 million words. It’s huge. Simplified characters. And another compiled in 1993 and 1994 from Chinese Newsgroups. All of these lists have vastly different orders for characters.

There are other lists. The point I am making is, what are you trying to learn? Obviously using one frequency list over another will target you towards that particular culture. Newspapers in Singapore may use shorthand characters for twenty and thirty which will not appear in other lists – although they may be included in a list of characters targeting Singaporean newspapers.

My advice is that if you’re just starting out amd you’re confused about which frequency list to use, use them all! Study the first 100 or 200 characters in one list, then go study the new characters in another list. It will be very revealing to you to understand which characters are the same and which are different. For example, Yi1 (one) and de5 (ownership particle) are going to be in the top ten no matter what, almost for sure. Other words may be closer or farther away. Also you may wish to know that some lists count bigrams such as “ta1 de” (his) separately from ta1 (he) and de (ownership). So you really have to understand what your frequency list is targeting before you can use it.

The way I solve this problem with Kongzi is by allowing the user to import whatever frequency list they like from the internet. It currently can import five or six different lists.

For anyone learning Chinese, I strongly reccomend Far East’s 3,000 Character Dictionary. It is indispensable and the frequency order is very good for learning daily Chinese you will see and speak all over the world. I’ve never seen anything better.

You may also be interested in James E. Dew’s “6,000 Chinese Words”. It is another indispensable book which lists many things including CYY and BLI grading, and several different frequency lists.

For advanced students only, I reccomend buying an entire series of very easy children’s books (think aesop’s fables, three little pigs) and indexing and counting each character in each book. That should take you a couple of months! And making your own frequency list from that. You will come up with about 800 words that are significantly common and you will discover that learning as few as 200 of them will allow you to read 90% of the material in the books. Again I cover this technique in my new textbook, “Welcome to Chinese”.

For super advanced students. Oh? Here’s something to think about. What percentage of the most common 3000 characters are nouns, what percentage are verbs, and so forth? Choose the most common 20 or 30 nouns, the most common 20 or 30 verbs, and so forth, and you could probably set up the kernel of a MCI system in 100 words. This puts near-conversational fluency in the hands of first year college students. This is something akin to what I’m doing for Kongzi and WTC.

Good luck on your Chinese journey!

Using Styled Documents

A few beta testers reported back that on their netbooks’ small, high resolution screen, the HTML-limited size of the Chinese Characters was too small. So I redid it using styled documents.

Notice anything different?

kongzi-styled-docs

It’s pretty much the same, except now you can blow the Chinese character up as much as you like.

Perfect for Netbooks! This was definately a step I needed to take in order to move towards Kongzi on portable devices.

One of the next big goals for Kongzi is adding a Speech module (i.e. press F1 to hear the Chinese). Associated with this will be a new “listen and type” style of quiz, “listen mode” for multiple choice, etc.

Development is proceeding extremely rapidly. I’m working on some UI issues now, making it better. Kongzi is becoming more and more robust. I currently am using it to teach myself but entering characters is slow-going, there are only 250 words in the dictionary now.

I have a really good feeling about Kongzi these days. I think it’s going to be a success!

Modifying Selection Properties of a JTree, JList or JTable

I like to write programs for those times that I come up with something elegant. What I really mean by that is when I learn something I didn’t really know before. Here’s today’s challenge:

You have a very long JTree, and a need to select several things in that JTree (This could be a JTable, JList, etc.). The idea here is that if you make a mistake and click on something you will erase all your previous selections. So, you want to change the default behavior of click-select.

The why is simple: it’s a royal pain to remember to hold down the control key. Plus being as paranoid as I am about data entry I would constantly scroll up and down the list before clicking accept – just to make sure everything I needed was properly selected.

I kept telling myself how nice it would be to just click to toggle the selected status instead of having to control-click. To have toggling as the default behavior.

There are a few ways to do this, but the best way I could come up with was to handle the mouse click itself. JTree, JList, and JTable all have methods for determining which item is being clicked based on the getX() and getY() methods from the mouse event. So I declared a mouse click handler for my JTree (list, whatever). This is what it looks like:


private void specialClickSelectionHandler(java.awt.event.MouseEvent evt)
{
if((evt.getButton() == evt.BUTTON2) || evt.getButton() == evt.BUTTON3)
{
TreePath tmp = tagview.getClosestPathForLocation(evt.getX(), evt.getY());
if (tagview.isPathSelected(tmp))
{
tagview.removeSelectionPath(tmp);
}
else
{
tagview.addSelectionPath(tmp);
} // if it's currently selected
} // if it's a mouse2 or mouse3 event.

return;
}

Now, when the user right clicks, it will toggle the selection.

Digging around I found some similar code for tables and lists on Sun’s Swing Archive, but keep in mind the code there uses isPopupTrigger(), and therefore needs to be modified for cross-platform use (since isPopupTrigger() works differently on different systems – see javadoc).

You could also use this technique to change the default action of the left mouse button, which might be a better idea. I don’t want any chance of mistakenly pressing left click and accidentally removing selections off-screen where I won’t see them dissapear. Well, there’s one headache gone. Now I can relax a bit more while entering the thousands of Chinese words I need into my database :)

Kongzi can now import CC-CEDICT

Yes, Kongzi Beta-5 can now import data from CEDICT (or CC-CEDICT). It has no problem loading up each of the over 100,000 entries and making them editable, searchable, and quizzable. Yet, there is a problem. It turns out that there is so much non-chinese information (such as korean characters, just as an example) and many unsupported (ancient?) characters, and the definitions so unwieldly – so many mistakes and missing pieces of information – that I don’t feel using CC-CEDICT is a good idea.

Looks like I’ll have to just keep adding characters one by one. It will take a while but beginners would only be interested in the first few hundred anyways, so I have my work cut out for me.

I suspect things would be different if there was any sort of organizational data in CC-CEDICT to help draw only on what would be useful, but due to several of the project’s stated rules (such as not creating a separate entry for parts of speech – a huge mistake in and of itself) and not including any frequency data or grading data (another huge mistake where learners are concerned) CC-CEDICT is actually quite useless for learners compared to better organized dictionaries or lists of words (such as the BLI or CYY lists, or the ones used to make Far East’s 3000 Learner’s Dictionary). In that respect you probably wouldn’t want to use CC-CEDICT. I’ll leave the import feature in, however, since you never know who may wish to use it for whatever purpose.

In other news:

kongzi-b5-mnmquiz

As you can see I’ve recovered the Mix ‘n Match quiz feature. The Multiple choice quiz feature was also recovered. Some cosmetic changes will come later, of course, but all the logic is there. Some of the logic is even more advanced than previously. You will notice that at least one of the answers (if not more than one) is highly similar to the one being presented. For example, we include the character for seven, and the phonetics for eight and nine, when the proper answer is three. And so forth. In Mix ‘n Match, we include no, when the correct answer is yes. And so forth. It makes for a more difficult (and therefore more interesting/useful) quiz. Note also that you can’t use similar terms to help you narrow down an answer. The logic will also include head fakes (see the SI prefixes above or the numbers included with yes/no below). You really have to know the answer, no guessing allowed!

You’ll also notice in the above that pinyin tone numbers and tone marks have been mixed. This is because I haven’t finished entering the shortcuts in. That’s fine, the program always recognizes tone numbers (and will always recognize both interchangeably once I finish the shortcuts).

kongzi-b5-mcquiz

The program is stable enough that I’ve started adding characters again (in order to have fun using it, and beta test it while doing so, of course!) In it’s current state it is very useable, however there are a number of things I’d like to do before releasing it and there are still a few bugs and things I haven’t hammered out yet. I’m still annoyed that I lost the memory game quiz. That was fun and amazing. Although now, the problem of word size has been solved by the implication it will select only from words of the same size (as such are organized by tags, providing a convenient solution). When I do reimplement it, I will cause it to toss in characters with the same radical (or which differ only by a radical) to help the user learn to visualize and discern between characters. I have high hopes for that feature.

I’d like to have at least 1,000 characters in the dictionary before I start public beta testing. One important organization is the grading scheme (A, B, ss, or Jia, Yi, Bing, Ding, etc.) used by some government testing programs, HSK grading, and that sort of thing. Which brings me to frequency representation. What a headache! I could easily include three different items for frequency and have them all be different. Instead of causing needless headaches, or lying to the user that only one number is accurate, I think that the best way to use ‘frequency data’ will be to allow the user to import frequency order lists from a file. That way if the user decides to switch from one order to another it would be an easy thing to do. This way I also save myself from having to write special code to determine which particular frequency order they were using. Brilliant!

Now, back to adding characters to the dictionary…

Populating Trees and Lists in Java

I’d like to comment on what went into the new Font Chooser dialog for Kongzi Beta-5. The problem can be posed as follows:

Create a dialog box that takes, as input, existing data.
Complication: Selecting a value on some lists will change the contents of the others.

Obviously this sort of design pattern would have important usages; configuration boxes, persistant user profiles, and so forth. Or, for example, a specialized Font Chooser – which is the example I’ll discuss. Take a look at this screenshot from Kongzi Beta-4:

Kongzi Beta-4 Japanese Fonts

I remember how fiendishly difficult this was to code the first time around, so the second time around I was prepared. The problem would be that I had to follow a special order: populate the language box, select the language, then use that event to populate the font box. But then I had to interrupt the normal order of things and populate the font box before I selected the font from pre-existing data. I couldn’t think of a way to do that without cutting and pasting code or creating a very convoluted logic.

But worrying about that before I wrote any code would drive me insane. So I plunged in head first, fleshing out the dialog box in Netbeans, then adding several methods to find and select values in the list, such as “SelectLanguage(String)”. That way whenever I wanted to, I could populate the dialog box with pre-existing data, in this case, the language and font data that had been selected previously. This is important, of course, to provide a smooth user experience.

Then the nasty null pointer errors started. Basically because of the logic/flow problem I described above. The logic for populating from pre-existing data, and the logic to modify itself based on user input, was too difficult to reconcile. When I tried to populate language, the size box wouldn’t be populated yet, so I couldn’t preview sample text for the new language. If I tried to populate the size box first, this would also trigger a preview event which would crash. The workaround in Kongzi Beta-4 was to include a preview button as shown in the above screenshot. Then I was technically excused from having to do anything. But just because I put a button on it doesn’t mean it wasn’t a giant kludge. For Beta-5 I wanted something smoother. I wanted to use the same logic at all times, the same entry points, and I didn’t want a preview button. I wanted selection events to populate everything.

The key to my final solution was in realizing that when I sent commands to the lists, such as fontList.setSelectedIndex(i), I was going to trigger events which would send me back into the auto-population and auto-selection methods. This is how it had to be, to avoid cutting and pasting code, making a terrible mess. Yes, I wanted to follow “accepted practices” of reusing the same code which auto-populated the lists to accept events and propagate their own data to other lists. Doesn’t everyone?

The first idea I had was checking for null pointers and then skipping over those lines of code. I rejected this almost immediately because of the difficulty it would present to doing any sort of error checking. I would in essence be assuming that any time I encountered a null pointer, I knew why it was there; and that was simply untrue. To write proper code, I would then have to write a second check to determine if the null pointer was there because there was a bug, or simply because the dialog wasn’t fully populated yet.

Or, as an aside, populated properly. I saw the potential for sloppy logic to start creeping in, caused by my abuse of checking for null pointers. In horror, I imagined several cases where entire blocks of code would never execute because they would always be called at a time when null pointers were there. I imagined terrifying cases where a population function would be called four, five, or more times, each time getting sent back with a null pointer, until the dialog was finally ready to accept the input. I recoiled from checking for null pointers, and resolved to find a better way.”

After staring into the monitor for a while, I came up with the idea of using boolean flags to keep track of what procedure I was in. The solution worked like a charm. I created several class-wide variables such as “boolean populatingFontList”. Any time a method dialog was performing an operation on a list where the selection would change as an unintended consequence, I simply set “updatingFontList” or “updatingSelectLanguage” (or whatever) to true. Then, at the beginning of that function (and any other appropriate place) I would put in a clause which prevented anything bad from happening. Usually something like “if (updatingFonts) return;” at the beginning of a troublesome function.

populatingSelectLanguage = true;
selectLanguage.removeAllItems();
populatingSelectLanguage = false;

Then when the program fires off it’s little events as a result of you removing whatever item was selected, you wont get an error because the font list isn’t populated yet. And so forth.

Reverse Engineering my own Brain

A few days later…

It is indeed a tribute to the amount of time I spent on flowing and commenting this code, that I have been able to step back up to the project so quickly. Sure there are lots of little nagging snags to overcome but I am following my previous cognitive pattern quite nicely. That is, in everything but one key area.

You see, Netbeans Form Designer does something really dumb. It creates code of the following ilk:

javax.swing.ActionMap actionMap = org.jdesktop.application.Application.getInstance(kongzi.Kongzi.class).getContext().getActionMap(ExportDialog.class, this);
exportButton.setAction(actionMap.get("exportSelected")); // NOI18N
exportButton.setName("exportButton"); // NOI18N

oohh… but you already know where this is going. This is stupid beyond all belief. I cannot possibly imagine what imperative posessed the minds of the people who made form designer to do this. This is simply, never a default desire of the programmer. It’s overcomplexity to the extreme.

The problem is that when yguard obfuscates my code, it doesn’t know how to rename the “exportSelected” string.

Now here’s the thing. A year and a half ago, I actually solved this problem. I tripped over the solution one day while reading the netbeans help files. I actually figured how to make it set the event handlers without using reflection. It was literally a one in a million chance. But all I can remember about the incident was that I thought “No one would ever have been able to find this, it’s buried in the help files” and “thank god I figured this out, now I can use form designer AND obfuscate my code!”

Ok so now, nearly two years later, what the heck was I thinking?

I have been all over – and I mean it – all over the help files.

I have looked EVERYWHERE in configuration, navigator and inspector. Nothing relevant. I’ve turned OFF automatic resource management (a good thing I think) but that still didn’t solve the problem.

The best solution I can come up with is to define, say, a jButton by usuing events–>actionPerformed versus “Set Action” which delves into the whole actionMap(x) thing.

The scary thing is, I don’t know if this is the solution I figured out last time, or not. Because I was very happy with the solution last time. I’m worried. I’m going to convert everything to use actionPerformed versus @action, but you know, it’s just scary, because the solution I had last time was so elegant, and now I don’t really remember what I did.

I wish I didn’t lose my backups.

I made lots of backups but they dissapeared.

I feel sick and sad because I lost my backups.

It was only two weeks of work but it, apparently, was the best two weeks of my entire life.

Kongzi Ressurection

You may not know this, but after the last blog post (about Kongzi) back in January 2008, a series of unfortunate events pushed me farther and farther away from developing Kongzi.

I’d like to list them here to help ease my mind. And why, because I’ve decided to focus on finally finishing this promising project.

For one, after I had completely removed all the netbeans auto-gui code, I figured out how to turn off reflection in the gui generator. It all came crashing down. “Shit,” I thought, “I just wasted several days of work”. What was worse is that I would have to redo all the forms because although I had a recent backup I was spending so much time on the project every day that it felt inconceivable to me to go to an old backup. Version Control. Yeah I know, I tried that. It’s a pain in the ass. I don’t want to comment on that. I’ve tried many version control programs for unix, windows, whatever – none of them appealed to me.

Anyways.

I text edited the original files and put back the IDE tags I had removed. So I wasted about a week on that whole adventure. But I finished it and resumed working on the project. Then I did something really dumb, I tried to install a version control system. Version Control systems are great – for large, multiuser projects. For something like this it is a TOTAL waste of time. Backups would have been better. So I lost all my work because the version system wasn’t installed properly, or I typed the wrong flag, or whatever. Poof. You know, I had tried to use version control systems before, really. I am not a novice user by any means. But this was the last straw. I actually had a system up and going which backed everything up for me. But losing all my code.. For fuck sakes – and I do not swear in vain – if I am going to have to back it up anyways I will simply not waste my time with a VCS of any kind.

So you know what I did, I did it all over again.

This is late January now. The program was going exceptionally well. I had all the features I had before. Then I did something which taught me a lesson. I started working on the licensing aspect of the program and I got bogged down. I didn’t really want to work on the licensing and the licensing aspect was boring. Slowly the code started to break in places because it had to be slightly redesigned to work with the licensing code I had written. So I lost interest. This is February now and I posted about how I felt but I didn’t say why I had been losing interest.

A few months later I had a motorcycle accident and I couldn’t type or write for a week or two so that was a problem as well. So by this point I was also into guitar and videogames for a while – well I had always been into games like Hitman, Halflife, Max Payne and so on and that sort of occupied my time. That and work. So I had totally given up on the project.

Now, get this. For some reason I seem to have lost my backups of Kongzi.

I don’t mean all my backups. I mean the “recent” backups from February. I’d just as soon snip out all the licensing code and work on the main project a little more but I can’t. So here I am. What am I gonna do.

First I have to justify why I want to do this. Oddly enough although I am working more than I ever have, I see many ways in which I could use Kongzi at work to help me teach English. Especially now that you can get a mini laptop for under $300. What a deal! Or perhaps it could run on one of those phones with windows mobile or a Palm (if they even make those anymore) and so on. I have a lot of ideas. I always did.

I mean, they sell those little electronic dictionaries. I could use that. But with Kongzi I could tailor it to the needs of an English Teacher. I could sell it as a package – a mini laptop and software. I’d make a lot of money like that. Lol. Or something.

So here’s what I did.

1. I pulled up all my old backups. Dec 17th. Dec 30th. January 3rd.
“Aww crap,” I said to myself. I lost a lot of work. Over a week and a half of amazing stuff.

2. I identified the Dec 30th backup as the day before I gutted the GUI code.

3. I put the Jan 3rd backup and the Dec 30th backup into the latest netbeans (installing, which, was an adventure all it’s own but there really is nothing better.. time has certainly shown who won the recent IDE wars)

4. I started bringing the Dec 30th backup “up to speed” with the Jan 3rd backup, minus all the form removal code. File by file. This will serve many purposes least of all refamiliarizing myself with the code.

When I am done, thank god, I also have those extremely convenient screenshots of what I was doing with Kongzi at the time I gave it up, so it should be trivial to redo the ten or so days of work I lost. And given more than half that time was spent on the irrelevant licensinc code… I am actually pretty excited about this now.

All things considered I could have Kongzi NeoBeta ready to accept new code by early October.

By Halloween I should have more work done on Kongzi than ever. By Halloween I expect to start spending most of my time working on the dictionary. By the end of the year I expect I could be completely finished this with a real live distributable CD.

Hmm. Let’s reintegrate that code and look at the screenshots of Beta-4 first :(

Gibson SG Standard

Oh yeah.

It is everything I ever thought it would be…

03-if-you-want-blood-you

This Blog is Finished

I’ve been deciding for a while to move further away from internet-based resources for a number of reasons. Primarily because I don’t feel I need to learn anything new, and secondly because I have nothing left to say to people. I guess this is due to the fact that everything is on record and if anyone wanted they could go look everything up. So I’ve decided to move in another direction in real life and to do that I need to free up all that internet time. The Chi FAQ is officially dead as well, I just don’t have time to educate people who really don’t want to be educated in the first place. If you’re interested in picking it up let me know and i’ll transfer it over to you. Good luck and God bless.