April 20, 2010

Learning Japanese with Sentences


One of the major components of my Japanese study is sentences. I add sentences written in Japanese to my spaced repetition program, Anki. When I review a sentence card, my job is to read the sentence aloud (to confirm that I know the readings for each kanji) and to understand the sentence. [1] For simpler sentences which I know well, usually this is automatic and immediate. For new or complicated sentences, understanding sometimes demands translation into English. As I get more exposure to the language, my brain will get better at skipping that slow and lossy translation step. That's the plan anyway!

Reviewing a sentence

Reviewing a sentence card where the answer and translation have both been revealed. jNetHack is a great source of sentences for me because I know NetHack so well.

I always translate from Japanese into English. Never the other way around. For one, the comprehensible input hypothesis, which is my guiding light for language study, suggests that production is only a side-effect of language ability, which is developed only when input is received. The input hypothesis even goes so far as to suggest that premature production is harmful, because in doing so you are probably reinforcing incorrect usage. [2] Also, more pragmatically, because I am not a native Japanese speaker, or even close to fluent, it is difficult for me to judge whether two sentences have the same meaning, or even similar nuance. So I would have trouble grading myself. This problem doesn't occur when I am producing English, which I have a bit more experience with. For Japanese, I rely only upon correct sentences written by native speakers, instead of whatever nonsense I come up with. :)

I also avoid studying soulless [3] vocabulary lists. Briefly, consider the difference between a lethal injection and a mortal injection. Because you've never heard of a mortal injection before, [4] it sounds wrong, even though lethal and mortal both mean deadly.

If I want to learn a new word, I force myself to find a complete sentence which uses it. So where am I getting these sentences? It turns out the answer is "from all over the place." But before I can talk about that, I want to explain a bit more about Anki.

Facts vs Cards

Anki is great for separating facts from cards. A card is familiar; each has a front (the question) and a back (the answer). A fact, on the other hand, is a set of key/value pairs. You create a fact then generate one or more cards from it. For example, if you are studying world nations, you could have a fact for each country which includes Country Name, Capital, Languages, etc. Then you could generate many cards which test you on going from country to capital, country to language, capital to country, etc. Anki's pleasant design turns out to be a great timesaver since it frees you from repeating yourself. You only type the country's name once but it can be used in many cards.

Creating a new fact in Anki

This is fact creation screen, which you will see a lot.

Generating cards from facts in Anki

This is the card template screen, which you will rarely look at. [5]

We added only four facts, but from them we created twenty-eight cards, each of which tests a single piece of data. This way, your "language of Australia" card will be independently spaced from your "capital of Australia", which is good when you have trouble with one of them. If you already know "English" but "Canberra" just won't stick in your mind, that's okay. You won't have to review that the Australians use English nearly as often as you review their capital city.


One of the fields that my sentence facts have is 'Source'. Other than the sentence itself, Source is the only field that I require myself to fill in. Translation, readings, context, etc. are optional. Knowing where each sentence came from is useful because it lets me gauge the trustworthiness of each sentence. I always trust the correctness of sentences uttered in, say, film, whereas an offhand remark on Twitter could easily have a typo or unusually playful grammar. It's also useful to know whether I have accidentally incorporated feminine language into my lexicon.

My diligence in citing every sentence I learn also lets me geek out and analyze precisely how I am learning Japanese. I wrote a script to pull all the values for the Source field out of the database and categorize them according to rules like:

qr{The Matrix} => { speech, native, movie, source("The Matrix") },
qr{iPod USB/dock cable box} => {
    text, native,
    medium => 'misc',
qr{IRC PM with (\w+)} => {
    student, correspondence, author(defer { $1 }),
qr{mt.endeworks.jp} => { blog('lestrrat') },
qr{Conversation at Ebisuya} => { convo, native },

My Input

Another component of my study is immersion. I try to always have Japanese music in my environment. I can't escape the deluge of Japanese tweets from those I follow. There's also movies, books, IRC, and when I'm lucky even in-person conversation. I expose myself to as much Japanese-meant-for-native-speakers as possible. Every now and then I get a sentence I almost understand, except for a word or a reading. For example, as for the sentence in the marked-up image at the top of this post, I already knew and understood everything except the reading for 門 (gate) which is もん (mon). So for each so-called i+1 sentence [6] , I look up the one piece I'm missing then add it to Anki. Repeat until fluent.

But this, for sanity's sake, is tempered with Japanese-as-a-second-language materials. It would be frustrating to listen to and read complete gibberish all day, so I begrudingly use a grammar guide. I also use various other second-language materials [7] , as long as they meet my impossibly high standards. [8]

Turns out my setences are almost exactly evenly split between these two types. Eventually the number of sentences which are intended for native speakers will dwarf the sentences intended for students. I seem to be on the right track, as evidenced by this shiny chart [9] (the x-axis is of course time but the unit is unimportant).

A chart of the intended audience by week

It's also interesting to look at the ratio of added sentences which I have read (94%) versus sentences I have heard (6%).

A chart of sentences I have read versus those I have heard

There are a few reasons for this disparity. I am reluctant to add sentences from music. Though lyrics are usually grammatically correct, it's fair to say they probably utilize some iffy language in order to fit the struture of the song. Also there's the peculiarity of kanji and how I'm studying them. I do much better with words written in kanji than those nd usually not a lot of context is needed (as opposed to a sentence in the middle of a book). The games from which I've mined most are Phantasy Star 4 and jNetHack (which I beat yesterday!). I'm not really into anime, but the sentences I have are from Fist of the North Star:


It's also worth noting that knowing the entire English script of The Matrix has been helpful for watching and learning from the Japanese dub. Just sayin'.


I got this idea of learning from sentences using spaced repetition from the guys at Antimoon. They learned English to native levels while living in Poland.

In a year, I will rerun the script I used to generate these charts to see how they compare. In the meantime, I better get to back to studying!


  1. I too-often have to mark sentence cards incorrect for flubbing a reading even though I know what the sentence means. So I am sorely tempted to generate two separate cards for each sentence, one which tests readings and one which tests understanding. It would be easy to do, but I'm not entirely convinced that I should.
  2. Though, of course, it's really fun to converse with speakers of your second language. The whole point of language is to communicate after all. So production there is fine. You just should not do it during review.
  3. I've been using this word a lot lately. It's excellent.
  4. Except when listening to language blowhards like me explain this point.
  5. If you're familiar with Moose, then maybe a useful analogy is that the Add Facts screen is like use Moose, but the Card Template screen is like delving into Class::MOP.
  6. Here, i is everything I understood about Japanese before adding the Gehennom sentence. +1 represents the reading of gate that I learned through SRSing this sentence. Now my i is a little bit higher, and I'm a little bit more prepared for the next sentence.
  7. Such as various Japanese-English dictionaries. I haven't started using a monolingual dictionary yet. That is a scary - but necessary - step to take. Very soon!
  8. I avoid learning sentences written or spoken by fellow learners - even those who are far better than me with the language. Sorry! Nothing personal, but my goal is to eventually confuse people into thinking my first language is Japanese. :)
  9. The charts in this post were generated with Chart::Clicker, written by the inimitable Cory Watson. Any anti-Tuftean errors are mine, not the module's.