Text Parser for Making Vocab Lists?

I’m a former jpdb.io user, but I’m looking to have a bit more control to make my own lists for Anki. I would like to be able to preserve original sentence context for vocab words, like the Kindle does, but I primarily read paper books due to portability. I tried Anki Dojo, but the parsing seemed to be extremely inaccurate for the short excerpt I entered. Maybe it can be tweaked?

Does anyone know an easy to use parser that would fit the bill?

Creator of Anki Dojo here. I assume you have installed the full mecab addon and entered the excerpt into “passage”?

Now, it might not recognize names and places very well , but you can add known words to Anki Dojo and skip certain entries.

Do you mind showing us the excerpt?

As far as other parsers, most of them use a lighter version of mecab which performs worse in general. You can try Sudachi or Ichiran, both of which have some areas that do better than mecab and other areas worse.

I have compiled this table a while ago, and according to my investigations back then, mecab and sudachi are still the most consistent across the board.

3 Likes

This was a few weeks back, so I don’t recall what it was exactly, but it was from むらさきのスカートの女。I can try again one of these days. (edit: I installed the recommended mecab addon from the ankiweb page).

While I have you here… :slight_smile: - I just tried loading two csvs from the excellent Immersion Reader to Anki Dojo, and noticed some strange behavior. The second csv contained duplicate vocab from the previous load, but instead of skipping them, I got the “nothing to add” error. The only solution is to check allow dupes, but that obviously creates dupes. Can this be fixed?

With the no duplicate setting, it won’t add duplicate vocabulary, so it would say “nothing to add” if all them are duplicates. It’s a feature, not an error.

If some of them are duplicates, only some will be added. Note that in the current version of Anki Dojo there is no progress indicator when adding cards, so you might have added them already after the first click. This behavior is to be improved in the future.

Regarding duplicates, there are duplicate scopes, but it seems to get a bit too complicated to understand right now, so I’ll allow duplicates as the default behavior in the next version.

When i tried it (a few times), it was rejecting the whole list if there was a single dupe. I would like it to only skip the dupe items and load the rest of the list, which is not what I saw.

That is what it should do. Try checking to see if they are already in your deck if that message comes up.

I don’t have your exact Anki setup, so if you don’t mind, try to setup a new Anki profile and using the Basic card template let me know the exact steps to reproduce the problem.

So here’s what just happened - I added 28 words, one of which was a duplicate. About 5 or 6 cards were added, and I got a “nothing to add” message. After deleting that one word, it loaded the remaining 27 fine. Something is not working correctly.

Sorry for the long wait.

I came back to fixing it this week. It should work with the latest version.