Allow CSV data dump

Description of your request or bug report:

Allow CSV data dump

Trello link:

Per a private DM, I’m considering doing this this week. It really should only take a day and I think you all could come up with some cool visualizations! :slight_smile:

(It also makes onboarding easier… people feel better if they can backup their stats)

jumping-yay

2 Likes

OMG. This would be amazing. :face_holding_back_tears: I could finally stop tracking my Japanese stuff on GR. :face_holding_back_tears:

We could even probably do a fancy community edited xls that reads from the CSV and does fancy graphics and stats :upside_down_face:.

And then Brandon might like them and copy them into the site :rofl:

:shushing_face:

don’t spoil his master plan. :rofl:

Ok, I got the first data download available! It’s just your user books data, not any reading sessions. But let me know what you think of the process, UI & the fields I surface.

You can find it here: https://learnnatively.com/account-settings/?form=data_download.

Not working for me. :melting_face: at first it didn’t do anything and now it’s telling me “issues with our servers”.

Nice, Thanks!

Would it be too hard to include the reading sessions on it?

Or even a separated CSV if it’s too complicated to merge it in the same table, then throw the documents in a .zip / .7z / .tar whatever.

Maybe it broke after generating mine :sweat_smile:

Sorry I should’ve said more explicitly. Yes, there will be more data that you can download… they’ll just be separate files. I’m imagining one for general data for each type and one for sessions for each of movies, tv shows & books. Ultimately 6 in total.

Oops! Well first bug to squash…

Thanks! BTW if you want to put the cherry on top. a small .txt document explaining each file would make it extra fancy.

Also while I doubt it’s too big of a deal, I noticed that you could potentially guess the URL of other users if you know when they have generated the file.
Just a minor security thing I’d though I’d mention.

And also, are the files deleted after a while is my guess?

So there is a ‘key file’ that you can download… it’s next to the ‘User Book’ title. There is a slight bug rn where it disappears however… but if you reload the page it’ll be there.

Yeah i thought about that… maybe i’ll add a random string at the end. Originally i had a totally random file name but I didn’t want the filename downloaded to be random, as that’d be unfortunate. I tried using an html attribute ‘download’ but chrome wasn’t abiding by it and apparently it’s very flaky… so I decided this approach.

Not at the moment, but eventually i’d probably clean up yes.

Guess the string would be enough, haven’t worked with cloudfront but if you don’t want to pollute the filename, you could create a signed URL:

But it’s probably too much of a headache, although maybe such a function would have an use somewhere else :sweat_smile:

YESSSSSSSSSS will check this out when i’m on my not-work pc

hah, yeah that’s for truly private stuff and I imagine would be a real pain to figure out, as most amazon things are for me :sweat_smile:

I think it’d be pretty impossible to guess if I put a random 6 characters on it…

Sorry, I had to do it :rofl:

You should be good to go!

And i fixed the ‘key file’ disappearing bug.

I will try to get reading session data up by tonight. Doing the tv data after that should be pretty straightforward hopefully. :slight_smile:

Edit: I also added a random string to the file name

Beautiful! I was a bit worried because my OpenOffice whatever couldn’t parse the Japanese properly, but I just stuffed it into google sheets and it works fine there. Time to start trying to play with the visualisers! Can’t wait for the session data!

Could we have an interface to this that is automatable, please? I would like to be able to have a cron job on my local machine that backs up the data once a week. I do this at the moment with booklog.jp, which works because they have a relatively easily scrapable interface that doesn’t require pressing any javascript buttons, so I can grab the csv with a couple of wget invocations.

shell script I use for booklog
# as of some time in 2019 their server started insisting on referer header
wget --save-cookies cookies.txt -O login.html --post-data='service=booklog&ref=&account=pm215&password='"$PASSWORD" --referer=https://booklog.jp/login https://booklog.jp/login

# now we need to load this web page, to fish out a specific link from it
wget --load-cookies cookies.txt -O export.html https://booklog.jp/export

DOWNLOADURL="$(sed -ne 's/.*\(https:\/\/download.booklog.jp[^"]*\).*/\1/p' export.html)"

if [ "$(echo "$DOWNLOADURL" | grep -c https)" -ne 1 ] ; then
   echo "Failed to find download URL in export.html!"
   exit 1
fi

echo "Loading csv from $DOWNLOADURL"
wget --load-cookies cookies.txt -O "$OUTFILE" "$DOWNLOADURL"

Another item for the future API :sweat_smile: