How
In order to run this you'll need a machine that can handle basic package installation.
If you want to make it public you need an AWS EC2 instance, but it's more important you start the capture side of things first.
Capture
- There are multiple ways to capture the web
- read this zine
- Fork the github repo of my project
- Install Heritrix
- Gather a seed list
- Validate your WARCs
Replay
Run through the read me of the project to set up dependencies
Collect media of any mimetype that you consider to be valuable. Aim to preserve it in it’s original context. An example of this would be instragram content. Try to crawl it using Instaloader, but if your preservation tactic involves exporting JPGs and crawling this as a single asset, that is just as good.