Allow CSV data dump

Description of your request or bug report:

Allow CSV data dump

Trello link:

Per a private DM, I’m considering doing this this week. It really should only take a day and I think you all could come up with some cool visualizations! :slight_smile:

(It also makes onboarding easier… people feel better if they can backup their stats)

9 Likes

jumping-yay

2 Likes

OMG. This would be amazing. :face_holding_back_tears: I could finally stop tracking my Japanese stuff on GR. :face_holding_back_tears:

4 Likes

We could even probably do a fancy community edited xls that reads from the CSV and does fancy graphics and stats :upside_down_face:.

And then Brandon might like them and copy them into the site :rofl:

4 Likes

:shushing_face:

don’t spoil his master plan. :rofl:

2 Likes

Ok, I got the first data download available! It’s just your user books data, not any reading sessions. But let me know what you think of the process, UI & the fields I surface.

You can find it here: https://learnnatively.com/account-settings/?form=data_download.

6 Likes

Not working for me. :melting_face: at first it didn’t do anything and now it’s telling me “issues with our servers”.

1 Like

Nice, Thanks!

Would it be too hard to include the reading sessions on it?

Or even a separated CSV if it’s too complicated to merge it in the same table, then throw the documents in a .zip / .7z / .tar whatever.

Maybe it broke after generating mine :sweat_smile:

2 Likes

Sorry I should’ve said more explicitly. Yes, there will be more data that you can download… they’ll just be separate files. I’m imagining one for general data for each type and one for sessions for each of movies, tv shows & books. Ultimately 6 in total.

Oops! Well first bug to squash…

2 Likes

Thanks! BTW if you want to put the cherry on top. a small .txt document explaining each file would make it extra fancy.

Also while I doubt it’s too big of a deal, I noticed that you could potentially guess the URL of other users if you know when they have generated the file.
Just a minor security thing I’d though I’d mention.

And also, are the files deleted after a while is my guess?

2 Likes

So there is a ‘key file’ that you can download… it’s next to the ‘User Book’ title. There is a slight bug rn where it disappears however… but if you reload the page it’ll be there.

Yeah i thought about that… maybe i’ll add a random string at the end. Originally i had a totally random file name but I didn’t want the filename downloaded to be random, as that’d be unfortunate. I tried using an html attribute ‘download’ but chrome wasn’t abiding by it and apparently it’s very flaky… so I decided this approach.

Not at the moment, but eventually i’d probably clean up yes.

2 Likes

Guess the string would be enough, haven’t worked with cloudfront but if you don’t want to pollute the filename, you could create a signed URL:

But it’s probably too much of a headache, although maybe such a function would have an use somewhere else :sweat_smile:

1 Like

YESSSSSSSSSS will check this out when i’m on my not-work pc

3 Likes

hah, yeah that’s for truly private stuff and I imagine would be a real pain to figure out, as most amazon things are for me :sweat_smile:

I think it’d be pretty impossible to guess if I put a random 6 characters on it…

2 Likes

random_number

Sorry, I had to do it :rofl:

6 Likes

You should be good to go!

And i fixed the ‘key file’ disappearing bug.

I will try to get reading session data up by tonight. Doing the tv data after that should be pretty straightforward hopefully. :slight_smile:

Edit: I also added a random string to the file name

5 Likes

Beautiful! I was a bit worried because my OpenOffice whatever couldn’t parse the Japanese properly, but I just stuffed it into google sheets and it works fine there. Time to start trying to play with the visualisers! Can’t wait for the session data!

3 Likes

Could we have an interface to this that is automatable, please? I would like to be able to have a cron job on my local machine that backs up the data once a week. I do this at the moment with booklog.jp, which works because they have a relatively easily scrapable interface that doesn’t require pressing any javascript buttons, so I can grab the csv with a couple of wget invocations.

shell script I use for booklog
# as of some time in 2019 their server started insisting on referer header
wget --save-cookies cookies.txt -O login.html --post-data='service=booklog&ref=&account=pm215&password='"$PASSWORD" --referer=https://booklog.jp/login https://booklog.jp/login

# now we need to load this web page, to fish out a specific link from it
wget --load-cookies cookies.txt -O export.html https://booklog.jp/export

DOWNLOADURL="$(sed -ne 's/.*\(https:\/\/download.booklog.jp[^"]*\).*/\1/p' export.html)"

if [ "$(echo "$DOWNLOADURL" | grep -c https)" -ne 1 ] ; then
   echo "Failed to find download URL in export.html!"
   exit 1
fi

echo "Loading csv from $DOWNLOADURL"
wget --load-cookies cookies.txt -O "$OUTFILE" "$DOWNLOADURL"
2 Likes

Another item for the future API :sweat_smile:

1 Like