So, first you need to download all the data that Facebook has stored for you (well, and what it’s willing to give you :)). You can do that by going to https://www.facebook.com/settings when you’re logged in, and you will see the following screen:
Here you can just click on the Download a copy of your Facebook data and after they process your request you will get the download link on your email.
The download package looks contains 3 folders (videos, photos, html) and a index.html file. In order to take a look at your posts you have to open up the html folder where you’ll find the file named wall.htm. This is a simple HTML file whose contents is put inside the <div class=”comment”> elements.
In order to parse this with JavaScript (you’ll need to have Node.js installed) you can use the following snippet:
function getMatches(string, regex, index) {
index || (index = 1); // default to the first capturing group
var matches = [];
var match;
while (match = regex.exec(string)) {
matches.push(match[index]);
}
return matches;
}
fs = require('fs');
fs.readFile('wall.htm', 'utf8', function(err, data){
if (err){
return console.log(err);
}
var regex = /comment">([\s\S]*?)<\/div>/g
var matches = getMatches(data, regex, 1);
var output = '';
for(var i=0; i<matches.length; i++){
output += matches[i] + '\n';
}
fs.writeFile('parsed.txt', output, 'utf8', function(){
console.log('done!');
});
});
What this does is that it reads the wall.html file and outputs just simple text of the posts in a parsed.txt file.
Few of the challenges here we’re the multiline regex matching which I solved with this link on StackOverflow, and another one was getting access to matched groups of a regex in JavaScript.


