December 7, 2014 – Nikola Brežnjak blog

NodeJS

How to get all of your own Facebook posts and parse them with JavaScript

So, first you need to download all the data that Facebook has stored for you (well, and what it’s willing to give you :)). You can do that by going to https://www.facebook.com/settings when you’re logged in, and you will see the following screen:

Here you can just click on the Download a copy of your Facebook data and after they process your request you will get the download link on your email.

The download package looks contains 3 folders (videos, photos, html) and a index.html file. In order to take a look at your posts you have to open up the html folder where you’ll find the file named wall.htm. This is a simple HTML file whose contents is put inside the <div class=”comment”> elements.

In order to parse this with JavaScript (you’ll need to have Node.js installed) you can use the following snippet:

function getMatches(string, regex, index) {
    index || (index = 1); // default to the first capturing group
    var matches = [];
    var match;
    while (match = regex.exec(string)) {
        matches.push(match[index]);
    }
    return matches;
}

fs = require('fs');
fs.readFile('wall.htm', 'utf8', function(err, data){
	if (err){
	    return console.log(err);
	}

	var regex = /comment">([\s\S]*?)<\/div>/g
	var matches = getMatches(data, regex, 1);

	var output = '';
	for(var i=0; i<matches.length; i++){
		output += matches[i] + '\n';
	}

	fs.writeFile('parsed.txt', output, 'utf8', function(){
		console.log('done!');
	});
});

What this does is that it reads the wall.html file and outputs just simple text of the posts in a parsed.txt file.

Few of the challenges here we’re the multiline regex matching which I solved with this link on StackOverflow, and another one was getting access to matched groups of a regex in JavaScript.

How to get all of your own Facebook posts and parse them with JavaScript

Recent posts

Categories