So, first you need to download all the data that Facebook has stored for you (well, and what it’s willing to give you :)). You can do that by going to https://www.facebook.com/settings when you’re logged in, and you will see the following screen:
Here you can just click on the Download a copy of your Facebook data and after they process your request you will get the download link on your email.
The download package looks contains 3 folders (videos, photos, html) and a index.html file. In order to take a look at your posts you have to open up the html folder where you’ll find the file named wall.htm. This is a simple HTML file whose contents is put inside the <div class=”comment”> elements.
In order to parse this with JavaScript (you’ll need to have Node.js installed) you can use the following snippet:
function getMatches(string, regex, index) { index || (index = 1); // default to the first capturing group var matches = []; var match; while (match = regex.exec(string)) { matches.push(match[index]); } return matches; } fs = require('fs'); fs.readFile('wall.htm', 'utf8', function(err, data){ if (err){ return console.log(err); } var regex = /comment">([\s\S]*?)<\/div>/g var matches = getMatches(data, regex, 1); var output = ''; for(var i=0; i<matches.length; i++){ output += matches[i] + '\n'; } fs.writeFile('parsed.txt', output, 'utf8', function(){ console.log('done!'); }); });
What this does is that it reads the wall.html file and outputs just simple text of the posts in a parsed.txt file.
Few of the challenges here we’re the multiline regex matching which I solved with this link on StackOverflow, and another one was getting access to matched groups of a regex in JavaScript.