You mean you don’t have an SRS deck for memorising unicode codepoint to kanji mappings?
And instead of the default Anki multiple choice, you have to input the codepoint
I knew about unidecode, but it defaults to Chinese mappings.
On searching further, I’ve found miurahr/unihandecode: unihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities - Codeberg.org
Otherwise, Japanese segmentation libraries might help with transliteration.
In full U+1234 U+5678 format
I’m not quite following. Can you give me an example?
The URL for 赤ずきん、旅の途中で死体と出会う 🐺 Home Thread 🔪 is simply https://forums.learnnatively.com/t/home-thread/7751
. Very non-specific.
The URL for きらきらひかる (Profoundly Weird Book Club) is https://forums.learnnatively.com/t/profoundly-weird-book-club/5064
. Wrong meaning.
I think that’s more of a Discourse limitation than anything else.
WaniKani Community uses percent-encoding, so preserving unicode characters when decoded. However, it’s not readable by default.
A possible solution I thought about is using Japanese segmentation.
$ sudachi -a
赤ずきん、旅の途中で死体と出会う wolf Home Thread hocho
赤 名詞,普通名詞,一般,*,*,* 赤 赤 アカ 0 [17550]
ずきん 副詞,*,*,*,*,* ずきん ずきん ズキン 0 []
、 補助記号,読点,*,*,*,* 、 、 、 0 []
旅の途中で 名詞,固有名詞,一般,*,*,* 旅の途中で 旅の途中で タビノトチュウデ 0 []
死体 名詞,普通名詞,一般,*,*,* 死体 死体 シタイ 0 [24863]
と 助詞,格助詞,*,*,*,* と と ト 0 []
出会う 動詞,一般,*,*,五段-ワア行,連体形-一般 出会う 出会う デアウ 0 []
空白,*,*,*,*,* キゴウ 0 []
wolf 名詞,固有名詞,人名,一般,*,* Wolf Wolf ウォルフ 0 [20699]
空白,*,*,*,*,* キゴウ 0 []
Home 名詞,普通名詞,一般,*,*,* ホーム home ホーム 0 [14167]
空白,*,*,*,*,* キゴウ 0 []
Thread 名詞,普通名詞,一般,*,*,* スレッド Thread スレッド 0 [412]
空白,*,*,*,*,* キゴウ 0 []
hocho 名詞,普通名詞,一般,*,*,* hocho hocho hocho -1 [] (OOV)
EOS
So aka-zukin-tabinotochuude-shitai-to-deau-wolf-home-thread-hocho
I wonder if they changed the code or is a non-default setting.
The fact is that you can put anything between the /t/ and the post number, as it’s just a SEO thing
https://forums.learnnatively.com/t/megumin-is-best-girl/5064
This will still work and load the proper preview:
Edit:
Seems like @seanblue went through the discourse URL shenanigans in the past:
So it could be a mix of a setting not enabled in Discourse and/or a web server configuration limitation.
You sure about that?
@polv @Megumin So I’ve switched the url generation to include japanese letters.
See 赤ずきん、旅の途中で死体と出会う 🐺 Home Thread 🔪
However, the sharing of the url now gets very long… which honestly makes me a bit hesitant to do this . Although it’s probably better for SEO, slightly.
(ex link: %E8%B5%A4%E3%81%9A%E3%81%8D%E3%82%93%E3%80%81%E6%97%85%E3%81%AE%E9%80%94%E4%B8%AD%E3%81%A7%E6%AD%BB%E4%BD%93%E3%81%A8%E5%87%BA%E4%BC%9A%E3%81%86-home-thread/7751?u=brandon").
What do we think?
- Yes. Even though it makes sharing the url slightly harder. It’s pretty.
- No. While it looks better, I don’t like the long copied url.
- I don’t care!
I think until browsers handle these things more gracefully is better without IMHO.
I would actually prefer all of these texts to just be replaced by “x”
That would save me from replacing them manually each time I want to put a link somewhere
Ok! I’ve decided to revert it back to ascii only.
@nikoru hmm, interesting. Unfortunately, since there are other languages, I think it’s slightly different for Japanese. I guess we could do another poll
I usually do that with amazon links, but I even strip the ascii, just leave the dp/ASIN lol.
I’m saving a lot of bytes of information!
Yeah I always strip them to the shortest format possible. Good to know about url bit here being for SEO purposes
Would it be impossible to implement @polv ’s suggestion? If I understand it right, it would change the url to make something like ワンピース book club read like wanpi-su-book-club
chart-y kind of q, is there a way to create a page tracking chart even when you don’t time your reads? i would like to track I read X pages on Y day over time until a book is completed (or i guess more pedantically I logged X pages, here’s the amount on the progress bar this went as percentage, eg).
… reading that back I think I will probably need to scribble down a visualisation or something to make that ramble make any sense.
You need to change/add the functionality to discourse source, makes the URLs probably extremely long, and being a romanization it probably hurts SEO.
Also it would need to distinguish between different languages.