These are some simple steps which I did in an experimental manner. Before starting we must choose a domain which had had contents, which had been linked and before buying an expired domain. If you dunno how to do it, you always have a SEO friend close by or a Online Marketing Agency which can help you out.
Truth to be told: there are a lotta domains and hundreds of THOUSDANDS expire EACH DAY: you only need be on the lookout and check that the expired domain which is going to be bought has links and a level of authority which makes the effort be worth it.
Once we’ve picked the domain, which, in my example was esalgopersonal.es and I checked it out with Open Site explorer, where I found an interesting authority level and dozens of links.
The following step, once we know it has linkes, is to taken them into account and ordering them by the domain’s authority from where we’re linked to the page.
A key point is, that, after checking in Opensiteexplorer the authority (you can check the PR but since it expired it’ll return a PR0), we’ll ensured that there’s historical content. Example:web.archive.org.
In my case I pick the March, 2010 time-stamp where the last content update is located at it. (The domain was then parked in “SEDO” and expired after a few months’ time).
Starting from here and, once we have the CSV provided by Opensiteexplorer I write a few PHP lines to import all I can from each entry.
With the CSV’s first column and a find/replace function by Netbeans in “regular expression” mode, we’ll replace the list with an array statement:
Once we’ve got the array in each of the pages, what we’ll do will be prepare the deeplink and execute the petition against archive.org, using the chosen time tag:
Starting at this point everything will be different because it’ll depend on the template which wordpress uses but I’ll give you some tricks to easy the scraping for each entry. I haven’t taken comments into mind yet, I only recover them and store them for future use.
Using the DOMDocument class we’ll be able to realize lookups regarding the DOM and gain entry into the elements, like the H1 for the title and the
single_post class for the entry’s body.
Once the elements have been extracted it’s preferible to insert them into a middle-stage table to then process the entries again and finally insert them into the corresponding wordpress tables.
Once all data has been inserted into the “posts” middle-stage table then we only need to pick each field and insert it on the wp_posts table field: that’ll be enough to make the entries show up published in the site. Use an SQL sentence like INSERT INTO wp_posts (..fields..) SELECT …fields… FROM posts
After importing all data to wordpress and carry out the necessary tests, buying the expired domain and assign a hosting, so that we don’t forget to buy the domain while programming the whole deal.
In my case, the worpress I assembled has been done with permanent links lacking a data so I added a “301” in the .htaccess so as to not to lose the existing links to entries which used
RedirectMatch 301 ^/index.php/([0-9]+)/([0-9]+)/(.*)$ http://www.esalgopersonal.es/$3 RedirectMatch 301 ^/([0-9]+)/([0-9]+)/(.*)$ http://www.esalgopersonal.es/$3
Even though this is but an experiment and there’s a lot to be improved, via a few simple steps we’ll be able to regain all “sensible” content from a site.
There’s a lot of job pending to get the site’s loyal copy back online: one example is that I skipped the images. It’s possible to search them and get them back via a request plus storing them in a shared space. I reccomend using S3, developed by AWS.
Comments are another important part, and, despite being harder to savage in each entry’s body, it’s possible to ID them and introduce them in an almost complete manner in the matching tables.
If you liked the entry and/or want to contribute with something more then use the comments or share it between your pals, help to boost quality link-building. Thanks.