The Raspberry Pi is the latest in ‘must have’ gadgetry so after receiving my £25 computer the size of a credit card, I set about finding out what it could do.
Followers of the blog will know that I posted a video tutorial on how to install the open source archival cataloguing software ICA AtoM in Windows, so I wondered if I could do the same thing under Linux on the Raspberry Pi. Linux is an open source operating system that is free to install and can be run on any desktop computer, indeed I run it on my home PC alongside a standard Windows 7 installation. Unlike Windows, which utilises a graphical interfaces for installing software and other administrator level operations, Linux encourages you to use text based ‘commands’ to get things done.
Whereas the instructions for installing ICA AtoM under Windows were fairly straightforward, there are a few more things to configure and install in a Linux environment, so I’ve put together what I hope is a comprehensive walkthrough of installing it on a shiny new Raspberry Pi.
While it’s not the fastest thing ever, I can certainly see potential in a £25 server the size of credit card appealing to small repositories looking for a cheap and easy way of testing and tinkering with an open source cataloguing solution.
NOTE – The instructions below are specific to Debian Squeeze, although I’ve noticed that there is a new recommended image (Raspbian Wheezy) available direct from the official Raspberry Pi website. I will try to replicate these instructions using the new image asap and update if necessary. In theory though, the instructions below should also work with Raspbian Wheezy.
- Boot up your Raspberry Pi and Login
- Install the webserver elements required to run ICA AtoM – apache, php and mysql. do this by typing the following into the command prompt;
sudo apt-get install apache2 php5 mysql-server php5-mysql
- The packages will then begin downloading and installing.
- You will be prompted to install packages and asked to confirm installation by typing
- Once mysql has been installed you will be prompted to provide to set the ‘password for sql root user’. Make a note of the password you use as you will need it when setting up ICA AtoM.
- Wait patiently for everything to finish installing and return you to a blinking cursor in the command prompt.
NOTE – At the end of the installation, apache may throw up an error ‘bad group name www-data’. To fix this, create the group in the command line by typing
sudo addgroup www-data
- Reboot the Pi by typing
- Once the Pi has rebooted and you have logged in again, navigate to the folder where apache looks for web files by typing
- Once you are in the folder (prompt changes to
pi@raspberrypi:/var/www$) download the zip file for installing ICA AtoM by typing;
sudo wget http://pear.qubit-toolkit.org/get/icaatom-1.2.1.tgz
- Wait for the file to download and then unpack (unzip) the archive by typing:
sudo tar -zxpvf icaatom-1.2.1.tgz
- ICA AtoM will now install and a vast amount of text will scroll past you on the screen. Wait until you are returned to a blinking cursor.
- Once ICA AtoM is installed, change the permissions on the folder to enable changes to be made and files to be created. Do this by typing:
sudo chmod -R 777 icaatom-1.2.1
- Now you’re ready to create the sql database that will contain all the data entered into ICA AtoM. To do this, type:
sudo myqsl -p(enter the root password you created earlier)
- At the
create database [database name];where [database name] is your chosen database name
- You are now ready to move to another computer to finish setting up your ICA AtoM installation. Before doing this, you’ll need to find out the IP address (web address) of the Raspberry Pi. Do this by typing:
ifconfigand making a note of the number after
inetaddrthat should start in either
- Move to another computer, open the browser and navigate to:
[IP address of Pi]/icaatom-1.2.1
where [IP address of Pi] is the address you found in step 15.
- You will now see the installer working through system checks. Once it has reached 100% click ‘Continue’.
- Enter the root password you set up in step 5 and leave the rest of the fields as default.
- Click continue and wait.
NOTE – I waited around 30 minutes before deciding the installation had crashed and rebooted the Pi by pulling the power lead. Ideally if you have this problem, you should return to the Pi and type
sudo rebootrather than pulling the power, but I had ‘low memory’ problems so couldn’t reboot cleanly.
- Wait for the Pi to boot up again and then return to the other computer and refresh the browser. This should move the page onto the next screen where you choose a name for your site and input username and password details.
- This completes the installation and you can then click on the link to ‘visit your new site’.
Admittedly, it isn’t the quickest installation of ICA AtoM ever, but it works! I’d welcome any additions to this walkthrough or suggestions on how to speed it up a bit, but I hope this outline is of use.
And there you have it…a fully fledged archival cataloguing system running on a £25 computer :-).
- Boot up your Raspberry Pi and Login
On Tuesday 6 March, I went along to the Archives and Society series seminar given by Simon Wilson on the ‘Early Steps and Early Lessons’ in digital preservation at the University of Hull. I was interested to see how an organisation developed from having no knowledge of digital preservation as a discipline to having the confidence to encourage deposits of digital material from both internal and external sources. Simon was keen to stress that the talk was base of early steps and early lessons; there is still plenty to learn and develop and the processes covered in the talks were very much a ‘work in progress’.
The first part of the talk addressed the first steps taken by staff at Hull, including reading around the (rather overwhelming!) amount of material on digital preservation and getting an idea of the challenges that digital material face. These have been documented in great detail elsewhere, so I hesitate to repeat them here, but needless to say ‘obsolescence’ and ‘format’ featured prominently!
The next part of the talk introduced the AIMS project, of which the University of Hull was part, along with the Universities of Yale, Stanford and Virginia in the United States. The AIMS project was looking to develop and ‘Inter-institutional Model for Stewardship’ of digital collections, and the full white paper detailing the projects findings can be found here. There’s no need to go into detail here, but the rest of the talk did highlight a few simple steps taken by the staff at the University of Hull that could be developed and used by other repositories.
- Do a collections survey. Work out what you’ve already received in digital format and consider any challenges concerning format and/or content. What do you need to be able to retain access to it in the long term? This can be as simple as a basic excel spreadsheet.
- Build a forensic workstation. Basically, nab yourself an old computer and add bits to it to deal with a wide range of different formats. See the blog post from the University of Hull. It definitely pays to make friends in the IT department for this. Get them to hold back any old floppy or zip drives to replace broken ones.
- Utilise free software. There are increasing numbers of programs out there that can aid with the management of digital materials. The National Archives file profiling tool DROID is useful in identifying what file formats you are looking at, and Karen’s Directory Printer will provide a detailed list of files, formats, dates of creation and modification and attributes. This list will prove useful at all stages of processing, but particularly so when assessing the material prior to deposit and as part of the accessioning process.
- Work with others. Relationships are critical to the management of digital materials, from relationships with potential outside depositors to internal users, information professionals and IT staff. Utilise other professionals ot discuss issues and bounce ideas around. You are not alone!
The key message that I took from the seminar was that, when faced with digital materials, do SOMETHING. Even if it’s something as small as a collections survey, anything is better than ignoring the problem in the hope that it will go away.
Digital materials require the Archivist to develop some new skills, but the traditional processes of accessioning, arrangement and description are still relevant to the management and preservation of digital materials. The main difference is that the process for managing digital material is more intellectual than physical – you do not have the files and folders in your hands, just representations of bits and bytes on a screen.
The seminar highlighted the need to tinker with available tools in order to gain confidence, something that I’m a great believer in. I’ve been installing and playing around with a number of these tools and hope to write up some of my experiences.
In the meantime, check out these links from the Hull History Centre to get you started;
Is there such a thing as a ‘typical Archivist’?
This question was posed by a friend on the Archives NRA Mailing List recently and prompted a series and fairly animated responses, mostly along the lines of ‘there’s no such thing as a typical archivist’. It was suggested that such a phrase may even be detrimental to the image of Archivists and suggest that the profession is more uniform than the reality. I’d certainly agree that it would prove problematic to define the ‘typical Archivist’ beyond saying that we list, make available to the public and preserve material of historical significance or otherwise deemed worthy of long term preservation.
I suspect however, that the way in which the Archivist goes about this role has changed significantly, not least in the methods used to make material available. The development of the computer (originally used to manage payroll in Lyon’s Tea Shops) and software specifically for the cataloguing of archive material, has fundamentally changed the way people create, disseminate and access information. Without an ability to embrace the face paced development of technology, there was a significant danger of repositories being left behind in terms of how they promote or provide access to their collections. Initatives for putting catalogue records online such as The National Archives’ Access to Archives and AiM25 made significant early inroads into tackling the problem.
As someone who is confident is using computers and who has a programmer for a boyfriend, I am confident that there are more effective ways of doing things with a computer than those that are currently practiced by most Archivists. A few pointers on how to use macros in Excel and how to program a query in Access could make the world of difference to your average Archivist. Why spend half a day manually searching through various spreadsheets for data when you can run a report from a query in 30 seconds? I appreciate that this may be specific to my work environment, and/or that there are other contributing factors that restrict the use of technology within an organisation, but you get the idea!
The original blog post requesting input from any technically minded Archivists who want to learn to program, posted by Alexandra Eveleigh, is here. Have a read and please do comment on why you would want to learn and what you think it could help with in your day to day work.
I’m certain that there are a whole host of different things that programming (and other general IT knowledge) can do to enhance to work of Archivists, and I hope that something comes from the invite to learn together with a group of like minded professionals.
So, does an Archivist ned to retrain as a programmer? I don’t think so, although a little bit of knowledge could go a long way to improving the final output of Archivists in terms of web presence, online resources and use of material. Initatives such as Code Academy will go some way to bridging the gap that exists at the moment, provided people have the time to devote to it. Where this is not possible, meeting with a group of like minded people in a relaxed setting or working through basic tutorials delivered online might well be the solution we are looking for.
To return to my initial point, I am not sure there is such a thing as ‘a typical Archivist’ but I do think that programming/computer science based skills need to become increasingly ‘typical’ in newly qualified Archivists if we are going to keep pace with emerging and changing technologies.
I for one hope to be blogging a bit more on this sort of thing and perhaps providing information on a few things that I’ve found useful, but in the meantime I’m getting my head down on Code Academy and reading ‘The Programming Historian’.
The title from this post came from a talk given by Andrew Featherstone of the Museum of London that was given at the DPC event ‘Digital Preservation: What I Wish I Knew Before I Started’. The meaning behind this statement was that, when considering digital preservation, it is very easy to get bogged down with solving the immediate problem, without taking the time to read around the subject (project reports/presentations/standards etc).
I’ve been mulling over what to do for my next blog post, and have even got several half finished posts drafted on my dashboard. I suspect this one might actually make it onto the blog though (perhaps even later today!) as the idea that we should all take time to ‘smell the digital flowers’ really struck a chord with me.
Earlier this week, I tweeted about needing a day off just to go through some of various links to information on digital preservation that I have found while trawling the net. Today, I’ve finally found the time to sit down and go through a few things and have found some really useful bits and ideas to follow up on.
Firstly, check out the Digital Preservation Coalition website. They are a key training provider on all things digital preservation and one on the main collaborators who are driving development in this area. They run regular events for all levels of information professional and publish helpful Technology Watch Reports on the challenges of different types of digital material. I was unable to attend their latest event (‘What I Wish I Knew Before I Started’) but have been able to read the slides and ponder the issues raised.
As well as reading around the subject I’m also planning on attending more events based around the challenges of digital preservation. One of the underlying themes of the subject as a whole seems to be collaboration. Quite often when one reads reports of events, the delegates are initially concerned about whether they will understand the technicalities of digital preservation, but come out of the event feeling empowered once they understand that archivists already have most of the skills required to deal with digital material. I was certainly excited after attending my first event of this type ‘Getting Started in Digital Preservation’ (also run by the Digital Preservation Coalition) – in fact, I think that event is what started me thinking seriously about a possible future as a ‘digital archivist’.
There is an abundance of really good information out there for anyone interested in digital preservation, and an increasing amount of events to attend and to meet other interested parties. The problem is not one that is going to go away, and it is not something that can wait for the ‘perfect’ solution to come along. Information professionals should be encouraged to experiment as much as possible and to communicate with other interested parties regularly.
I’m hoping to play around with a few more tools and to write something up here (although I can’t promise anything more coherent than a stream of consciousness!); in the meantime, check out these links and start ‘smelling the digital flowers’!
http://blogs.ukoln.ac.uk/jisc-bgdp/ <<JISC Beginners Guide to Digital Preservation
http://www.dpconline.org << Digital Preservation Coalition
http://www.ariadne.ac.uk/issue46/rusbridge/ << ‘Excuse me…some digital preservation fallacies’, by Chris Rushbridge
http://www.clir.org/pubs/archives/ensuring.pdf << ‘Ensuring the longevity of digital information’ by Jeff Rothenberg
ICA AtoM is an example of archival cataloguing software, used to provide readers with access to archival materials. Cataloguing software is widely used in the information and heritage sectors to provide access to collections, but ICA AtoM is something a little different, that is is open source, meaning, most importantly to organisations, it’s FREE.
The term ‘free’ when discussing open source software, refers to both the concept of ‘free as in free beer’ and ‘free as in free to edit, update and improve’. Any repository looking for a fully functional, ISAD (G) compliant system need look no further, provided you have a little confidence and willingness to experiment.
I think the key thing when learning anything new, is to just jump right in and give things a go. In the spirit of encouraging people to try things out, I’ve put together a video stepping through the installation and configuration of ICA AtoM for people to have a look at. Notice the total time of the video, 9 and a half minutes. This is REAL TIME i.e. it takes less that 10 minutes to get ICA AtoM up and running on a PC. Granted, it’s a bit more complicated if you were installing it on a large scale, but the principle is the same.
I should note that the process is a little slower than it would otherwise be as I was running it on a virtual machine. The video is best viewed in full screen.
The basic steps for installation shown in this video are;
- Download and install WAMP (Windows Apache MySQL and PHP – the software that allows the computer to act as a webserver)
- Download the ICA-AtoM software from www.ica-atom.org
- Install ICA-AtoM by unzipping the file and copying the contents to the ‘C:/wamp/www’ folder (or the ‘www’ folder wherever WAMP was installed)
- Type ‘localhost’ in the browser to navigate to where ICA AtoM is installed
- Follow the prompts (using WAMP to create a new database when you notice the error message)
- Configure a name and login details for your site and Voila.
This video will step you through setting up the software on a standalone computer for tinkering purposes. The process will be different for a full blown installation on corporate servers (but not that different!).
Go on, have a go!
Paper and ink, with the right care and attention, pretty much lasts forever. Or for 100 odd years at least. The average lifespan of a CD is 10 years. You can do a lot in 10 years…I sprouted a good few inches, passed a load of exams and got a job, but in the grand scheme of things, and certainly to an archivist, 10 years is a drop in the ocean.
Digital information does not last as long as its paper equivalent, and this is why it is so important to be aware of the limitations from the moment digital material is created. While we are used to 100 year old (and more!) records surviving, we may well have become complacent when it comes to the letters (and indeed blog posts) that we are writing on our computer right now. This problem has been discussed at length by many individuals who are much more well versed than I, but I feel I can at least summarise the problem.
Digital information requires a third party (be it hardware like a monitor, or software like Microsoft Office) to enable it to be viewed be a human being. Eyes alone no longer cut it unfortunately. The rate of development of software and hardware, means that, potentially within a few years of creation, files cannot be read by either the software or the computer that initially created them. The main options available to someone who encounters this problem are;
- Preserve the hardware that was used to create the document to allow for it to be viewed over time
- Run the out of date software on a modern computer using a method known as ‘emulation‘
- Convert the file into a more modern file format for use with modern software (more commonly known as ‘migration‘.
All of these present their own unique issues that would need separate blog posts to go into, but they go some way towards ensuring accessibility of digital material over time.
I suppose the main defence against digital data loss is simply an awareness and understanding of the issues. Digital items, unlike their paper equivalents, cannot just be placed in a BS5454 compliant storage facility and left to their own devices. Digital preservation is very much an active concern; ignoring items can lead to their irretrievable loss within years (sometimes less) of their creation.
There are a few simple things that can be done to help slow the state of obsolescence in digital material; these steps can just as easily be taken by Joe Bloggs on his home PC as by John Smith the archivist in a server room of a national organisation.
- Make copies…then make some more. Simple but true. Keep several copies of files on several different types of media, preferably in several different physical locations as well, that way, if one fails for whatever reason, you always have a backup. Burning something to disc and then forgetting about it does not constitute a robust preservation strategy.
- Check files regularly. Can you still access everything? Does it look the same as it did when you created it? Regular review of material (every 6 months or so) should allow for time to migrate to newer formats if problems arise. Of course, migration of material nearly always results in some form of data loss, but you should be able to preserve what are known as the significant properties of a file using this method (more detail on this is coming in another post).
- Label everything. And I don’t just mean with physical labels either, although they’re a good start. You need to find a simple solution to record metadata (data about data) about the items you are looking to preserve. This can be done via a physical label, handwritten paper document or a computer file (preferably a copy on each!) and should record as much detail as possible about the material in question e.g. number of files, types of files, programs required to view the files, sizes of files, a brief description etc. These records could be the only clue that any future user may have to unlocking the information of the media.
- Have a process in place. Take some time to think about digital items in your collection. How much material are you likely to be dealing with? How often does it need checking? Are you going to do anything to the files when they reach you to ensure they are recorded properly? Come up with a process for managing items that suits your needs, and make it part of a yearly/6 monthly routine.
The above list is by no means exhaustive, and I stress, is purely a product of my reading and experience with computers, but these few simple things should go a long way to easing the pain of dealing with digital material.
This issue has been discussed at length by others – perhaps even a little too much, to the point where archivists are nervous of the issues and hesitant to learn skills that they may see as ‘overly technical’. The purpose of this brief post then, is to alleviate some of those fears by suggesting that a few simple steps can make all the difference in dealing with digital materials. These basic steps can form a solid building block for the more complicated elements and help to build confidence in the digital preservation solutions arrived at.
I’m a geek. I have been since I was 7 playing Buggy Boy incessantly the family Atari ST 1040. Anything to do with computers, the internet and gadgetry in general and I’m all over it.
I’m also an Archivist (not an activist or even an anarchist…an Archivist), meaning that I spend my working day cataloguing historical documents, repacking books covered in red rot in acid free paper, answering historical enquiries relating to collections, and anything else that I can find to do with myself.
I’ve always been interested in history and the idea that documents from the distant past can be preserved for future generations and, after my work experience at 16 when I was taken to a strongroom and shown an anglo-saxon house deed (wow!), I began to realise this was something I could do for a living.
You would be forgiven for thinking that these two elements of my personality must be entirely independent, after all, how many Latin reading computer programmers do you come across? (Perhaps a little extreme but you get my point!) However, I am increasingly discovering that a lot of what I do in my spare time on a computer has come in very useful in my job as an archivist, and have developed a specific interest in digital preservation (ensuring that digital documents/video/sound files remain accessible for future generations).
The humble document, the basis of an archivist’s role, no longer means a piece of paper in a folder on a desktop. Indeed, even a ‘folder on a desktop’ has a different meaning in a digital context. The way we create material is changing, and the role of an archivist has to change with it. Job specifications increasingly require ‘good IT skills’, including the occasional ‘familiarity with XML/EAD/METS (plus any number of additional obscure acronyms) and I for one think it could and should be easier for archivists to access simple information of the management and preservation of digital records.
The aim of this blog then, is to chart my interest in digital preservation, and provide some notes, personal thoughts and links to interesting or otherwise relevant information.
Current ideas of future posts include open source software, computer science basics for archivists, simple, practical digital preservation solutions and the changing role of the archivist.
Thanks for reading – new posts should be landing soon!
- Digital Humanities
- Digital Preservation
- Open Source
- Installing ICA AtoM on a Raspberry Pi
- Digital Preservation at Hull – Archives and Society Seminar
- Archivist or Programmer?
- ‘Smelling the Digital Flowers’
- ICA AtoM Cataloguing Software