Learning Vocabulary via "Chunks" of Kanji

Learning Vocabulary via "Chunks" of Kanji

This post covers the approach that I take to studying Japanese words and Kanji. Although this approach (may) involve learning somewhat uncommon words that aren't particularly valuable in early immersion, I believe it's an incredibly quick way to build a strong intuition for the language.

The existence of kanji is likely the most troublesome aspect of learning Japanese. There are thousands of characters that need to be deeply embedded into your subconscious mind, each with a handful of different readings and meanings depending on the context in which they’re used.

Not easy.

To make matters worse, kanji inhibits (at least during the beginning stages) the most effective language learning strategy there is: reading. For other languages, if you see an unknown word enough times (in a context you understand), you’re likely to pick up the word naturally. With kanji, you might learn the meaning of a word through mass exposure, but in the absence of furigana (or dictionary lookups) you will never learn how the word is pronounced.

Even learning the meaning of words in immersion with unknown kanji can be difficult because you’re likely to see it as a mass of random scribbles.

In short, a lot of work has to be done to tackle kanji before you can even get to the starting line for really learning the language.

Kanji is one of the most important aspects of the language to grasp, and it also is one of the most limiting bottlenecks. For those reasons, it’s incredibly important to optimize the process of learning kanji, so you can more quickly acquire the language.

Over the past 10 months I’ve done a lot of experimenting with various approaches to kanji study. Below, I’ll share my experience with each approach with you, and make an argument for why I believe the approach I settled on is the best.

Traditional RTK

This is the only method listed that I haven’t personally tried, but I thought I’d briefly mention it due to its popularity. Remembering The Kanji is a book/method/course for memorizing the kanji by means of mnemonics. The idea is that you associate some English keyword with each kanji, and remember a mnemonic based on the keyword and components of the kanji.

The goal of RTK is to be able to recognize the meaning of a kanji when you see it in the wild (where the meaning is the keyword you memorized), and also to be able to write kanji based on the keyword that is in your head. The course covers the entirety of the Jouyou Kanji.

There are two main benefits to this approach.

  1. You can recognize almost every kanji you’ll see in your immersion.
  2. You can write kanji.

Of all of the methods mentioned in this article, this is the only one that gives you the ability to output (write kanji).

However, this method has a couple really big downsides that make it not-so-ideal.

  1. It takes a long time. Even the legendary Stevijs (who learned way faster than you ever will) reported that it took him 3 months to finish working through RTK.
  2. During this time you’re not learning Japanese. You’re only learning to recognize kanji as individual entities (as opposed to random scribbles). It’s incredibly demotivating to spend several months of effort without knowing a single bit of Japanese in the end.
  3. The keywords that you associate with the kanji are very often wrong or misleading. The result of this is that even once you know the “meaning” of all the Jouyou kanji, you’ll very quickly realize that you wasted time learning the wrong meaning.
  4. In order to properly learn the mnemonic stories, you have to learn a large number of “primitives”. Primitives are essentially kanji radicals, but their meaning is made up and are only meaningful in the context of the mnemonic stories.

Many people have done RTK and gone on to reach a very high level in the language, but I’ve never heard a single person say that they’d do it again if they could start over. In fact, everyone I’ve heard that’s done it has said it’s one of their biggest regrets in learning Japanese.

Of course, you can’t neglect that it teaches you to write kanji (which is a surprisingly difficult skill). One might argue that this is the correct approach to take if you want the most comprehensive understanding of Japanese as a whole.

However, considering even Japanese people struggle with writing difficult kanji it might not be the highest priority. Even if you really want to hand-write in Japanese, it would likely be a better use of time to use the Kanken Deck later on after you’ve got a basic grasp of the language.

Recognition RTK

Recognition RTK (RRTK, formerly “Lazy Kanji”), is RTK but without the writing portion. In other words, the focus is solely on recognition, not on output. Furthermore, the system is ported to Anki so that you can get the benefits of SRS to learn more quickly.

RRTK generally only covers the first 1250 kanji (as opposed to the 2,136 Jouyou kanji). The idea is that once you get through 1250 kanji, you’re pretty good at recognizing kanji as individual entities (rather than scribbles), so you can pick up future unknown kanji in immersion without explicitly studying them.

As you can see, by making these changes, RRTK attempts to drastically reduce the time spent doing RTK, so that you can move on to actual Japanese study. Of course, you learn less overall compared to RTK, but the argument is that there are diminishing returns in isolated kanji study, so this is enough.

However, all of the downsides to RTK still exist here.

  1. It still takes a long time to complete. Even if you do 25 cards per day (which is a pretty hefty workload), it would take 50 days to finish.
  2. That’s 50 days of work without learning any Japanese.
  3. The keywords are still wrong/misleading.
  4. You still have to learn non-meaningful primitives.

Despite these issues, RRTK remains one of the most popular methods for studying kanji. This is because it’s effective at doing exactly what it said it would do (and also because there aren’t really great resources for any other approach).

Personally, I did RRTK when I first started, but quit after learning around 850 kanji (this is extremely common, so don’t feel guilty if you try RRTK and quit halfway).

As a side note, I should mention the existence of RRTK450 which is an alternative to RRTK that’s made up of the most common 450 kanji. The goal of this deck is to get into learning actual Japanese as fast as possible. I personally wasn’t comfortable with kanji after knowing 450 of them, but if you’re willing to bear the pain for awhile, this could be a decent approach.

Brute Force

After giving up RRTK after ~850, I decided to completely delete the deck and just grind vocab + immersion for awhile. My idea was that even without independently studying kanji, as long as you spend enough time and effort immersing in the language, it would all just eventually stick.

This is something that I still believe to be true, and is potentially even a valid approach depending on your personality. The JP1K deck attempts to teach Japanese this way (although it wasn’t around when I tried it).

The issue is that this method is incredibly painful. It takes a really special kind of person to be able to spend large numbers of hours focused on things you don’t understand at all. Knowledge from RRTK (or similar isolated kanji study) is like a life vest keeping you afloat. You’re still lost at sea, but at least you’re not drowning. On the other hand, the brute force method of ignoring kanji study altogether is like throwing a baby right into the deep end and hoping things work out. It might work out, but they’ll be emotionally scarred for the rest of their life.

I should mention, there’s a consensus in the Japanese learning community that the people who learn the fastest are the people with the highest pain tolerance. These are the people who read 5 hours a day for a year despite understanding next to nothing. Thousands of dictionary lookups an hour, cross-referencing grammar dictionaries, etc…

While the brute force method is absolutely a practice in masochism, I’m not sure whether it would prove a steel-like will, or idiocy. If I had to pick, I would wager that it’s still a bit more efficient to acquire at least some kanji knowledge before trying to brute force things, but honestly I’m not sure anyone really knows for sure.

Kangxi Radicals

Kangxi radicals are the most popular system of radicals for Chinese characters. There’s 214 of them, with which you can build just about any kanji. More importantly, unlike RTK primitives, each Kangxi radical has an accurate meaning (though, admittedly, some of the meanings have become outdated over time).

Kangxi radicals were the approach I decided to try after giving up on pure brute force. Remember, one of the most important aspects of isolated kanji study is that you gain the ability to quickly acquire new unknown kanji. Kangxi radicals are useful in this regard for two reasons:

  1. Because you can build essentially any kanji out of these radicals, learning them gives you the ability to notice the different “parts” of any kanji you see (as opposed to random scribbles).
  2. Because there’s only 214 of them, you can learn them within a week.

When I used this approach, I worked through this Anki deck. However, since then this deck has also been released which contains around 50 extra radicals not covered by Kangxi.

I do believe there’s value in studying the Kangxi radicals, although not necessarily in the approach I took. Studying a small group of radicals can be seen as the minimum necessary kanji study before jumping straight into real Japanese. However, since I had previously done 2/3 of RRTK (even though I had deleted the deck a month or two prior), I already had background knowledge.

My point is: if doing 850 cards of RRTK isn’t enough to make me feel comfortable with Japanese, 214 radicals probably isn’t enough either. That said, being a beginner is rough, and I’m not sure that any amount of isolated kanji study will make it painless. With that in mind, Kangxi radicals might not be a bad approach.

Vocabulary Alongside Kanji

There’s another downside to the previously mentioned methods that I haven’t mentioned yet: over time you forget the kanji meanings.

Even if you don’t delete your kanji deck, after the intervals get high enough you might often think “this looks familiar but I can’t remember it”, or even worse, “I swear I’ve never seen this kanji in my life”.

The reason for this, as far as I’m concerned, is context. One of the most important things when learning anything, is making connections between the various things you know (even if they might, at first, seem unrelated). However, when doing isolated kanji study you lack any context. That is, there’s nothing for you to make connections with, because you’re learning each kanji in a vacuum.

Of course, the use of mnemonics in ®RTK is a tool to try to combat the lack of real meaningful context. As far as I can tell though, close to half of Japanese learners don’t even use mnemonics when doing RRTK, let alone make their own mnemonics (like you’re supposed to). Even if they do, it’s my opinion that crafting good mnemonics takes time — time that would be better spent learning Japanese.

So, how do we get meaningful context for kanji?

Vocabulary.

In language, meaning comes from what you say and how you say it. Although we can say that kanji is inherently meaningful, they’re only meaningful because they represent vocabulary. In other words, vocabulary is the thing that gives meaning to kanji.

When you learn RTK keywords (or similar), you’re not learning the meaning of kanji, you’re only learning an approximation of the meaning of the vocabulary in which it’s used.

So, here’s the idea:

  1. “Learn” a kanji. This can be done with any jouyou kanji Anki deck, but personally, I believe Jo-Mako’s kanji deck to be by far the best. You don’t need to use mnemonics, just pick the most accurate keyword(s) and try to make it stick. In the case of Jo-Mako’s deck, the back of the card shows several example vocab words. I think it’s worth picking the keyword that most accurately represents those vocab words.
  2. Learn the most common words that use that kanji. An important note here: the words you learn should only contain kanji that you’ve previously “learned”. More on how to automate this in a later section.
  3. As you do this, you build an intuition for the kanji. At that point, you don’t need to consciously think of the keyword you chose (remember the keyword is only an approximation of actual meaning), because you’ll have a deep understanding of the kanji and how it’s used in various contexts. Each word you learn is another connection to help make it stick.
  4. After learning the kanji + several vocab that use it, move on to the next kanji. Do this one-by-one until you’ve finished the jouyou kanji.

When I came up with this idea I really thought it would be the be-all-end-all for my kanji/vocab study. However, after a few weeks I began to notice a problem with it.

This method still lacks context.

In the book Make It Stick: The Science of Successful Learning (which I recommend), there is an entire section on an idea called “interleaved practice”. The idea is that, rather than learning one thing at a time, you should learn several (potentially unrelated) things at a time. By mixing your reviews like this, your retention of the material drops slightly in the short term, but sticks much better in the long term.

For ®RTK, you achieve interleaving by studying thousands of potentially unrelated kanji at the same time. However, it lacks the context of vocabulary by pushing that off until you’ve finished kanji study. By the time you get to vocabulary study, you start losing a lot of the gains you’ve made in the kanji study.

This method has the opposite issue. You get context for each kanji by learning associated vocabulary, but you miss out on the benefits of interleaving because you don’t move on to the next kanji until you’ve mastered the one you’re working on.

Chunks of Kanji

By learning what I call “chunks” of kanji, you get the benefit of both vocabulary-context and interleaving. The method is as follows:

  1. In your kanji deck, learn 100 kanji. This should only take 4-5 days. Again, there’s no need to use mnemonics. You can if you really feel like you need them, but with this small of a number of kanji I believe they’re unnecessary and just add time to your learning.
  2. Learn vocab associated with those 100 kanji (can also include kanji that you’ve previously learned).
  3. Rinse and repeat.

Here, each group of 100 kanji + vocab is a “chunk”. By mixing up 100 different kanji and vocabulary you really force your brain to work hard for the answers (interleaving), and you build the necessary connections (context) between kanji and vocabulary quick enough that you don’t forget the meanings of the kanji (unlike RRTK where you might not learn associated vocab for months, thereby forgetting the kanji entirely).

This is the method that I’ve been using for several months now and I’ve found it to be an extremely effective way of learning both kanji and vocabulary.

One major benefit of this approach is that it allows you to start learning actual Japanese very quickly (as opposed to RRTK).

Another benefit is the speed at which you can acquire new vocab. For each chunk, there’s a really obvious process that occurs. For the first half of the chunk you are slowly building connections. You’re realizing what common readings go where, how they change depending on the position of the kanji in the word, etc. However, during the second half of the chunk, all of that becomes extremely intuitive, to the point where you can often guess the reading/meaning of a new word the first time you see it. Even if you can’t, once you see it you’ll think “oh that makes sense”.

With every chunk you complete, your intuition for kanji will get more and more firm, and you’ll learn faster and faster.

Setting Up Your Decks

Learning kanji via chunks is a system I’ve never heard anyone else mention before, and I think there’s a good reason for that. There’s simply not resources available for it. Being tech savvy is useful in this situation because you’re likely going to have to create your own deck. I’ll show you a few ways of doing this, and I’ll even recommend a way of manually mining these words in case you’re not-so-tech-savvy.

Subs2SRS Decks (requires some programming knowledge)

This is the approach that I used, so obviously I think it’s a pretty good way of (somewhat quickly) creating a high quality deck. However, it’s a bit tricky so if you can’t code then move on to another method.

  1. Download a bunch of subs2srs decks. I think I probably downloaded like 30-40 decks when I did this. Preferably you would download decks for shows you’ve already seen, with priority being given to slice of life shows.
  2. Convert all of them to the same note type, then merge into 1 giant deck.
  3. Use MorphMan to populate the MorphMan_Unknowns field. This will give you a list of every word you don’t know in the sentence.
  4. Delete every card with the 0T tag.
  5. Export the deck to csv
  6. Write a python script to accomplish the following:
    1. Create a dictionary of each jouyou kanji alongside the RTK ID or KLC ID (depending on what order you’re learning your kanji deck in)
    2. For each row, look at the MorphMan_Unknowns field. For each word in it, check the RTK/KLC ID of every kanji. Find the highest ID in the list.
    3. Assign each row a field that contains that ID (which will be the sort field in the card browser later). For the word that contains that highest ID, make a field called “word” (or similar) that contains that word.
    4. Export the modified data back into a csv
    5. Just a side note, the script I wrote and linked is absolute garbage and was modified from a different script I wrote, which makes things even more incomprehensible. However, you can likely spend a bit of time modifying it to work with your own data to make things work.
  7. Import the csv back to anki. The audio and image fields will be messed up so you’ll have to fix that. Using the “find and replace all” function in the card browser it’s pretty quick and easy to remove the added characters from those fields and make them work again.
  8. Sort the deck by the id you assigned. Suspend every card in the deck. Because the ID assigned to the card is the same as the ID of the kanji you’re learning in your kanji deck, it’s easy to match the two decks.
    1. Learn 100 cards in your kanji deck.
    2. Unsuspend every card who’s ID is within that range from your vocab deck.
    3. Learn those words.
    4. Repeat.
  9. With this setup the sentences aren’t guaranteed to be i+1. If you want, you can use morphman to show i+1 cards, or use word context cards (WCC). With my note type you can have the word highlighted in the sentence. If you decide to ignore i+1 and use WCC, you can change morphman to look at the word field instead of the sentence field, and turn modify off. That way your K-count will still increase by 1 per card.

Using The Known Kanji Manager Addon

Known Kanji Manager is an addon I used for awhile, but stopped in favor of the approach above. However, the addon itself does work, but it’s a bit finnicky and can be kind of weird to setup.

I won’t give detailed instructions on how to use it (since I haven’t used it in so long that I forgot), but the general idea is that it will scan your kanji deck to see which kanji you already know, and then scan your vocab deck to see which cards use those kanji. From there it will suspend/unsuspend the relevant cards.

I would still recommend the process of downloading a bunch of subs2srs decks and merging them into one giant deck for this.

Using KLC Vocab And The Japanese Example Sentences Addon

In most KKLC-based kanji decks, including Jo-Mako’s kanji deck, the note type includes the vocabulary from the KKLC book. This means that for every kanji, you have up to 4 vocab words available.

Something I tried in the past was to add 4 new card types to the note (1 for each vocab word), that would just show me the vocab word on the front and the definition on the back. This worked, but it suffers from the same issue that all simple word cards suffer from. They’re fast to review, but that’s pretty much their only strength.

We can improve that system by taking it one step further and using the Japanese Example Sentences addon to turn those into WCCs. You’ll have to add new fields to the note type (for 4 new sentences), and then run the addon several times to generate a sentence for each word.

You can also use AwesomeTTS to generate audio for these cards if you want to.

From there, suspend every card other than the kanji card type. Review 100 of them, then unsuspend the cards for the WCCs you just generated.

While this is a fairly easy option, it has 3 main downsides that you should be aware of:

  1. The KKLC vocab words, in my opinion, aren’t always the best words for building intuition of the kanji. It’s a frequent occurrence that an obscure word will be chosen over a common one. I’m not against learning uncommon words, but it’s foolish to learn them at the expense of common words.
  2. The Japanese Example Sentences addon will generate sentences that use kanji you haven’t seen before. It might generate sentences that are way too difficult for you to understand. The context of a sentence isn’t really useful if you can’t understand it.
  3. Since there’s no native audio, the best you’ll get is AwesomeTTS, which isn’t that good.

This is the tradeoff that comes at the expense of generating a deck rather than hand-crafting one. Reasons like these are often why people praise self-mined cards to be the best. Ultimately however, it’s up to you to decide if the tradeoff is worth it.

Manually Mining

Sentence/word mining can be really difficult if you are trying to learn via chunks of kanji. If you are immersing in native material, it would likely take you all day (or longer) to find enough sentences that fit all of your criteria.

Luckily, there’s a solution to this, albeit a somewhat flawed one.

The KLC Graded Reading Sets are a series of books containing “30,500 mini reading exercises with parallel English text”. The important thing though, is that they strictly follow the KKLC kanji order.

That means, for every sentence in the book, you will only see kanji that you’ve learned before, and each new sentence will contain the specific kanji that you’re interested in learning.

I personally manually mined from the first 3 volumes of this series before I moved on to the first approach listed here. I used the Migaku Dictionary addon + AwesomeTTS in order to quickly make new cards from the book.

The books aren’t perfect. They have a tendency to use weird sentences, and rare vocab. That said, it wouldn’t be called “mining” if it was handed to you for free — you have to dig for it!