Allow CSV data dump

Megumin · July 3, 2022, 1:39pm

Description of your request or bug report:

Allow CSV data dump

Trello link:

brandon · November 4, 2023, 1:00pm

Per a private DM, I’m considering doing this this week. It really should only take a day and I think you all could come up with some cool visualizations!

(It also makes onboarding easier… people feel better if they can backup their stats)

Megumin · November 4, 2023, 4:08pm

jumping-yay

Biblio · November 5, 2023, 11:23am

OMG. This would be amazing. I could finally stop tracking my Japanese stuff on GR.

Megumin · November 5, 2023, 12:02pm

We could even probably do a fancy community edited xls that reads from the CSV and does fancy graphics and stats .

And then Brandon might like them and copy them into the site

Biblio · November 5, 2023, 12:07pm

don’t spoil his master plan.

brandon · November 13, 2023, 8:48am

Ok, I got the first data download available! It’s just your user books data, not any reading sessions. But let me know what you think of the process, UI & the fields I surface.

You can find it here: https://learnnatively.com/account-settings/?form=data_download.

Biblio · November 13, 2023, 8:53am

Not working for me. at first it didn’t do anything and now it’s telling me “issues with our servers”.

Megumin · November 13, 2023, 8:54am

Nice, Thanks!

Would it be too hard to include the reading sessions on it?

Or even a separated CSV if it’s too complicated to merge it in the same table, then throw the documents in a .zip / .7z / .tar whatever.

Maybe it broke after generating mine

brandon · November 13, 2023, 8:57am

Sorry I should’ve said more explicitly. Yes, there will be more data that you can download… they’ll just be separate files. I’m imagining one for general data for each type and one for sessions for each of movies, tv shows & books. Ultimately 6 in total.

Oops! Well first bug to squash…

Megumin · November 13, 2023, 9:00am

Thanks! BTW if you want to put the cherry on top. a small .txt document explaining each file would make it extra fancy.

Also while I doubt it’s too big of a deal, I noticed that you could potentially guess the URL of other users if you know when they have generated the file.
Just a minor security thing I’d though I’d mention.

And also, are the files deleted after a while is my guess?

brandon · November 13, 2023, 9:03am

So there is a ‘key file’ that you can download… it’s next to the ‘User Book’ title. There is a slight bug rn where it disappears however… but if you reload the page it’ll be there.

Yeah i thought about that… maybe i’ll add a random string at the end. Originally i had a totally random file name but I didn’t want the filename downloaded to be random, as that’d be unfortunate. I tried using an html attribute ‘download’ but chrome wasn’t abiding by it and apparently it’s very flaky… so I decided this approach.

Not at the moment, but eventually i’d probably clean up yes.

Megumin · November 13, 2023, 9:07am

Guess the string would be enough, haven’t worked with cloudfront but if you don’t want to pollute the filename, you could create a signed URL:

But it’s probably too much of a headache, although maybe such a function would have an use somewhere else

Jintor · November 13, 2023, 9:09am

YESSSSSSSSSS will check this out when i’m on my not-work pc

brandon · November 13, 2023, 9:09am

hah, yeah that’s for truly private stuff and I imagine would be a real pain to figure out, as most amazon things are for me

I think it’d be pretty impossible to guess if I put a random 6 characters on it…

Megumin · November 13, 2023, 9:11am

Sorry, I had to do it

brandon · November 13, 2023, 9:29am

You should be good to go!

And i fixed the ‘key file’ disappearing bug.

I will try to get reading session data up by tonight. Doing the tv data after that should be pretty straightforward hopefully.

Edit: I also added a random string to the file name

Jintor · November 13, 2023, 9:30am

Beautiful! I was a bit worried because my OpenOffice whatever couldn’t parse the Japanese properly, but I just stuffed it into google sheets and it works fine there. Time to start trying to play with the visualisers! Can’t wait for the session data!

pm215 · November 13, 2023, 9:54am

Could we have an interface to this that is automatable, please? I would like to be able to have a cron job on my local machine that backs up the data once a week. I do this at the moment with booklog.jp, which works because they have a relatively easily scrapable interface that doesn’t require pressing any javascript buttons, so I can grab the csv with a couple of wget invocations.

shell script I use for booklog

# as of some time in 2019 their server started insisting on referer header
wget --save-cookies cookies.txt -O login.html --post-data='service=booklog&ref=&account=pm215&password='"$PASSWORD" --referer=https://booklog.jp/login https://booklog.jp/login

# now we need to load this web page, to fish out a specific link from it
wget --load-cookies cookies.txt -O export.html https://booklog.jp/export

DOWNLOADURL="$(sed -ne 's/.*\(https:\/\/download.booklog.jp[^"]*\).*/\1/p' export.html)"

if [ "$(echo "$DOWNLOADURL" | grep -c https)" -ne 1 ] ; then
   echo "Failed to find download URL in export.html!"
   exit 1
fi

echo "Loading csv from $DOWNLOADURL"
wget --load-cookies cookies.txt -O "$OUTFILE" "$DOWNLOADURL"

Megumin · November 13, 2023, 9:55am

Another item for the future API