Monday, 28 January 2019

Automating Facts in EU Privacy and Security Legislation

Privacy law is very general because it can't possibly cover all situations, and the technology and social environments change all the time. Nevertheless, hard facts would help a lot for those of us just trying to make privacy work. Practical privacy decisions are very hard to automate.

It turns out there are facts in EU privacy and security law, and some of them can be automated.

Facts Embedded in EU Privacy Law

It isn't just the GDPR. Six pieces of new EU law are a kind of privacy extravaganza. They are a cluster of rules and incentives with some common themes for transforming how we live and do business in the 21st century.

There is rather a lot of new law:

GDPR - a mashup of human rights, commercial incentive and a lot of implied computer science

Networked Information Systems Directive - for traditional utility suppliers, and modern apps

European Electronic Communications Code - for traditional telecoms operators, and modern apps

EU Cybersecurity Act - for improving infrastructure security in each country, and also modern apps

Regulation for the Free Flow of Non-Personal Data (the anti-GDPR; no geoblocking within the EU)

ePrivacy Regulation - still a draft law. Does for communication what the GDPR did for data

Added to these are the country-specific versions of these laws, and despite them all being compatible there are significant differences, with some countries going a lot further, other countries having complementary traditions and so on. With the EU 27, plus the six other countries who have signed up to the Extravagenza including the UK, that is more thousands of pages.

Facts Embedded in The Laws

The one outstanding Extravaganza fact that everyone seems to know is "fines of 4%" and it isn't just the GDPR that has potentially huge fines for big companies. But there are other hard facts too, ones that help increase compliance and are good business.

There are some unifying themes of computer science facts, commercial facts and legal statements of fact. Human interpretation and advice is always needed when it comes to applying complicated law. But when software needs to make a decision about what is and is not permitted activity, it can really help to have facts available in real time.

Lack of facts seems to be a universal constant in the world of privacy as I learned when doing GDPR consulting work with companies focussing on the applied realities. Few organisations can answer even really basic factual business questions such as "where is all the company data kept?" and "what backups exist?" because they aren't used to asking them. That is why progressing to questions like "what personal data is kept?" or "do you have any idea if your systems are secure?" gives highly approximate answers. My approach was to reduce the problem size, that is, to stop doing the really bad things immediately and then plan gradually improve the rest. But even when there is a good privacy environment in place with great policies and training, value judgements are still made on a daily basis. Hard facts are relatively rare because that is the nature of privacy.

There are at least three kinds of facts in the Privacy Extravaganza:

Facts about IP addresses, not only users' IP addresses but also server IPs. The Extravagenza works by computers taking actions, and all computers/devices involved have IP addresses.
Facts about business definitions. Some of these a company will know, and must commit to, and some of these can be looked up.
Facts about relationships between entities, because the relationships are specified in law.

If we discard the ambiguous cases, if we do not attempt to answer questions where shades of human judgement are needed, then those facts that remain are likely to be robust enough to consulted in real time as a piece of software deals with high speed data transfers. Even transfers between people or computers about which little else can be known for sure. In science terms, this is falsifiable knowledge, because we can prove it to be incorrect rather than shrugging our shoulders. These facts are valuable.

Facts About IP Addresses

This gets a bit complicated and messy, so first of all here are the categories of IP address I have found implied in the law and computer science:

Public server IP Addresses. These have no privacy issues (they aren't people) and various facts can be established about them. For example, if I am about to send an email or initiate an internet voice call with someone - my computer can know some hard facts about IP at the other end, or at least, be positive that no facts are available.
Private server IP Addresses. These would be for example within the same company, which in a lot of cases is a Europe-wide or worldwide private network. It might be harder to get facts in this case, because of misplaced assumptions of trust. But otherwise, as per public server IP addresses.
Public and private personal IP addresses. Here we need to be careful, because there are privacy issues. But still, within the law, we can still make some statements of fact based on publicly available information.
IP addresses belonging to special categories of organisation, such as defined in the NIS Directive and Communications Code.
IP addresses claimed by particular organisations, usually large ones, who are easily identifiable
Server or client IP addresses which advertise technical information that prove they do not meet security requirements of the Extravaganza. This probably means we are unable to complete communication with that IP address - and that is a new category of error implied by the security sections in at least three of the Extravaganza laws. ERROR - INSECURE CONNECTION is going to be one of the most annoying sights of the internet until improvements are made in the quality of services. We can often have the facts in advance that this is the case, and take some action such as find an alternative or let the user down gently.

Once we have some knowledge of the category of IP address we can start to ask some questions. We cannot guarantee to answer these questions, but we can guarantee that if we do provide an answer, the answer will be a good factual one.

Here are a few of many such possible questions:

Is this IP address in Extravaganza-covered address space? That is, EU27+6 countries, plus depending on what communication is being attempted, various other countries. The algorithms are in the law and there are public databases to help provide factual information. This is legal as much as computer science, with a bit of geography too.
Is this IP address listening as a server (eg email, or internet voice, etc)
Is this IP server obviously insecure, by Extravaganza definitions?
Is this IPv6 address not using privacy features when it should be
Is this IP address coming from/hosted by an EU company?
Does this IP address host services which obviously fail the ePrivacy Regulation?

Facts About Business Definitions

The online world meets the legal world in the Extravaganza laws, among many other places. That implies that a domain name or IP address needs to relate back to a legal entity.

Generally a human is needed for that... for example, CNIL is fining Google tens of millions of Euros for privacy breaches, but what is the actual company being fined? Will that have anything technically to do with the Google company which is operating in Austria or Spain? That is why, thought it is tempting to try to relate domain names to businesses and IP addresses, that quickly becomes too complicated and error prone and requires manual intervention. But the Extravaganza comes with some databases implied and already taking shape which do change this somewhat:

We will soon be able to know at least some of the business names and at least some of the domain names of companies required to register under country-specific NISD and Communications Code provisions.
We can already in some countries look up the companies and related information registered for privacy reporting purposes.
Large companies usually host their own DNS in a transparent way, and we can relate that to registered business information.
Public disclosures of privacy and security breaches, for certain companies, will be available.

That said, Extravaganza definitions of what country a company is deemed to be operating in, and where an IP address can be said to reside and more of that nature, are all very flexible and quite the opposite of hard facts.

Facts About Legal Relationships

GDPR Controllers and Processors have mandated contracts. Computer science says that they must share passwords as part of that.
Certain types of large Extravaganza-covered company have mandatory registration and reporting relationships with Country-level security certification organisations.
ISPs and Hosting companies have different mandatory registration and reporting relationships with country-level security organisations
Every company covered by the GDPR has a mandatory registration and reporting relationship
Some of the companies listed above have implied relationships with each other directly, rather than through any third party or government organisation, in the event of a security incident
Many different kinds of organisation have some public communications obligations under Extravaganza terms.

The Future of Privacy and Security Facts

A picture is emerging of a kind of first-pass approach to privacy legislation, where the facts we know can be applied automatically. This can establish a very useful baseline for everything else.

Monday, 13 August 2018

A Design Challenge for Horologists

I have a strong interest in non-electronic computers as an educational tool. It remains as useful today as eleven years ago, when I published a letter to the editor in the January 2007 issue of the British Institute of Horologers. An horologer is someone who makes mechanical clocks and watches, and horologers definitely don't believe in electronics.

Everyone can benefit from the principles, but lawyers, politicians and anyone who cares about privacy and law enforcement really needs to understand this. The law defines what can and can't be done with a computer, and to some extent it even defines what a computer is.

I think one of the best ways to picture the essence of a computer is to have one in your hand without any electronics.

As some background, on my desk I have this little beauty:

It can store numbers and do long division and multiplication - turn the handles, push the buttons, ding! But despite being an essential tool of business for a century, it's only a calculator, not a computer. I picked it up in a flea market in Helsinki :-)

My hand-operated calculator:

would not meet most legal definitions for being a computer. It can't store a list of instructions, and its programming (long division implemented in cogs and levers is still a program!) cannot be changed.
would, when in use, meet the legal definition for processing personal data under the GDPR. I could use it to add up your expense records for the month, and the amount of the last expense and the total of all expenses would remain stored and readable in the machine when I finished. (I seriously doubt this calculator would be used to cause GDPR breaches! But it is important to understand the principle of these definitions.)

In order to be a computer, it has to have a decision system capable of reading a programme and executing it, or what we would call a CPU in an electronic computer.

The Babbage Analytical Engine was a full mechanical computer designed in 1837. The design has been demonstrated many times since to be workable, and there were even programs written for it by Ada Lovelace. She was the first to realise the Difference Engine was much more than a calculator.

The 1937 Z1 computer, built in a small home in Berlin, was a fully functional mechanical computer, using electrical motors to turn and move the components:

Image result for earliest mechanical computer Z1

The Zuse Z1 computer

It was soon destroyed in bombing raids, but the Z1 was the first recognisably modern computer.

In a talk I do about "What is a Computer?" I usually play this clip from the US Navy showing their mechanical fire control computers. This is (a) fascinating and (b) a reminder that typically computing advances are first used to do a better job of killing people.

... and all that explains why I wrote in the January 2007 issue of the British Institute of Horologers! I got several replies from horologers (real actual clockmakers!) but didn't achieve my goal. I have made some progress though. What I really want to see exist is an actual working clockwork computer that performs useful tasks we can recognise from today's world of computing. It's clearly feasible.

And here's my letter:

A Design Challenge for Horologists

January 22, 2007

Dan Shearer

dan@shearer.org

Until this month I hadn't even heard of horology. I'm a computer scientist, occupied with what people do with electronic technology and software, and what they do to people. Over the years I'd seen clocks in museums piece, marvelled at the old navigators, and once I read an article on apprentice horlogers in Geneva. But after meeting some lawyers recently I realised I had to learn about watchmaking.

Here is the challenge:

I need to design a fully clockwork computer. The computer must be a work of horology, not merely mechanical engineering. It must function recognisably like a familiar electronic computer, accepting commands from a keyboard to run programs and display results on a screen.

This article explains my motivations. As I did the research, I realised that with probably just two advances in horology such a design could become reality. I wrote a second article discussing in more detail the practical implementation issues involved.

A Computer? Why?

Like everyone else, I'm affected by laws involving computers. Laws tell me what I'm allowed to do with a computer, and if I become a victim of computer crime I need help from the law. But the more lawyers I met the more I realised I won't get the help I need if the people in the legal system can't even recognise a computer when they see one. More broadly, we live in an age where computers surround us, often invisibly – and computers process data, data that can clear me or convict me, save my life or endanger it. It is a trifle worrying that the individuals who can care for me or accuse me, educate, defend or prosecute me are likely to overlook computer data involved since they're thinking “oh, a kind of beige box with a keyboard and screen”. How are they to realise that the laws governing the computers in their life affect them hundreds of times a day?

So I started looking for an unforgettable illustration. Something to show a computer is a thing that does computing. It doesn't even need electronics, let alone a beige box. That's what lead me to clockwork. There is something homely and understandable about machinery that goes 'tick-tock', in contrast to the seeming magic of electronics. I want people to think about the notion of computing rather than a computer.

My new UK passport contains a computer too, programmed (as shown by The Guardian) to give all its information to anyone that asks, without a password. If the chiefs of the Home Office understood that the new passport was as much a computer as their own laptops, might they have given their computer experts better instructions?

Horological View of a Computer

A computer is any device which can:

obey instructions (e.g. add 48 every 1 time the instruction occurs)
store a list of instructions (e.g. add 48 this time, then 36 next time, etc.)
receive and remember information (e.g. when someone turns a winder)
decide which instructions to do next, and when to accept information

Except perhaps the last point, the list (and the numbers) should be familiar to horologists. It describes a stored program computer, something computer science calls a Von Neumann Architecture. We'll look at components of a Von Neumann-type machine, and how they might be viewed in terms of mechanical devices. One of the most striking things is that horology already comes close to a lot of the functionality/

Input – A device that receives information, maybe from a human. Examples: Someone typing on a keyboard from a manual typewriter. The information might be in response to a question (“How old are you?”).

Output – Makes information available directly to humans by displaying it somehow. Most like a traditional computer would be interactive screen output via a split-flap board, like most railway stations used to have (remember the flick-whirr when it was updated?) Typewriter output on paper would be another option.

Memory – For storing information so it can be accessed later. The basic unit of information in computing is usually an “on” or an “off”. So if you want to store the word “Clock” it gets translated into a series of ones and zeros, which are then stored by on/off switches. Horologists know all about programmable switches, which mean “if the switch is set then take one action, if it is not set do something else”. The extra twist is to have a way of detecting whether the switch is “on” or “off”. The ability to detect switch setting is called “reading memory”. Once you can do that it is a matter of having a lot of these readable switches to give the computer a reasonable amount of memory. With these two issues solved, the ones and zeros corresponding to the word “Clock” can be written to memory by setting and unsetting a series of switches, and later read back.

Arithmetic and Logic Unit – For doing operations with numbers. Older readers will remember mechanical adding (or calculating) machines that were manufactured in quantity up until the late 1970s, a centuries-old idea. Besides adding, multiplying etc. there's one or two other operations but none of these should be technically difficult to design from a horological point of view.

Control Unit – Executes lists of instructions, or programs. Probably the only component that doesn't have anything in common with horology (as far as I know so far!), this unit directs the flow of events. For example looking up a number in memory and telling the Arithmetic and Logic Unit to add 48 to that number, then store the result somewhere else in Memory, or maybe Output it. The Control Unit is the real brains of the show, and is in charge of executing programs.

The Missing Magic

Having these components of a notional computer are all very well in theory, but they aren't quite enough for a useful computer. Computer science has come up with some ways of tying them together, one of which is straight out of horology.

Bus – An information channel between the foregoing components. Implementing this in clockwork will require some ingenuity. In a silicon computer the Bus is like a copper wire linking the memory, control unit and so on, allowing electricity to travel between them. With horology we need to get information (such as the word “Escapement”) from the Memory to the Output, or from the Control Unit to the Arithmetic and Logic Unit. An example (but I don't necessarily suggest feasible!) way might be to have an oscillating central bar containing whiskers that can be pushed in and out to indicate different values, where the whiskers are adjusted by levers immediately next to the levers used to read the values and each oscillation moves the location of the whiskers from the setting levers to the reading levers. I'm not covering implementation challenges in this article, but its worth reflecting that Bus speed is a vital issue for how practical this computer will be.

Clock Signal – a single master beat that is used to synchronise all other activity in a computer. If we're fetching information from Memory using the Bus or a performing a calculation the Clock Signal is the only way of making sure we're not tripping over ourselves by using the wrong number, or the right number twice etc. Increasing the speed of the clock signal – assuming all the other components can keep up – is one way of speeding the entire computer up.

Storage – Like Memory, but lasts longer and is usually bigger. A mechanical equivalent of a filing system. You put information in and can get it back out when you want it. A storage system can be punched cards, or pianola-like punched paper rolls, or small plastic cards with very fine ridges and dips after the style of a music box's data. There have been storage systems in use since early days of the industrial revolution, and I'll be surprised if there isn't at least one horological tradition of using them somehow!

The Other Reasons Why

A clockwork computer may actually be useful for reasons other than educating Her Honour.

Physical Longevity. We have a good idea what happens to clockwork after a few hundred years, but there are real question marks surrounding all forms of silicon computers. Nobody really knows what happens to transistors as the centuries roll by, and if you need a computer for a simple task such controlling the doors in an long-term nuclear waste storage facility perhaps a clockwork might be better. Watch making techniques and materials can produce such tiny and reliable systems that they may be worth considering for these tasks.

Physical Robustness. There are a few physical environments where intense radiation makes electronic computing inherently unreliable. For very simple tasks, might clockwork computing be useful?

Micromechanics. A lot of research is being put into machines made of components that are truly tiny. Scientists are creating gear wheels that are a comfortable size for an ant to pick up, and have been experimenting with tiny geartrains, levers and so on since the 1980s. This is a very practical field of research and there are results in production now. One of the interesting things about micromachines is that they can often be mass produced using photolithographic techniques. A practical design for a clockwork computer might be able to be applied at this scale of engineering. I am cautious because friction is more important in microengineering rather than less, but perhaps some of the other physical effects may compensate such as inertia with high oscillation rates.

Conceptual Longevity. A generation of silicon-based computing equipment lasts maybe two years before becoming obsolete. When communicating with far-distant generations, maybe it might be wisest to provide the design for a conceptual clockwork computer and then the programs that can run on that, rather than anything electronic. Nobody has ever built a Babbage Analytical Engine (see the next article for more about Charles Babbage and his mechanical computer from two centuries ago) but there is a computer simulation of capable of running programs written by Babbage and his students. A communication consisting of a series of computer programs accompanied by schematics of a physical computer that will certainly run these programs is an extremely clear communication. Any technically sophisticated person would merely implement an emulation of the computer rather than the actual clockwork, but they will have no difficulty understanding the design because it is simple mechanical principles.

Conclusion

Why horology? I could have approached robotocists, who spend their lives at the mechanical end of computing. But I think a robotocist has rather too much silicon thinking already, and besides they like to use hydraulics and other very clunky techniques. I can imagine a competer without electronics that is as incomprehensible in its design as any silicon computer! Using techniques of robotics seems as far from horology as Babbage's mechanical engineers. I want that 'tick-tock'.

I'm also intrigued by my reading so far that very little seems to have changed in horological principles in the last 120 years or so. Techniques have improved, and tolerances, and modern materials and tools are a help. But there hasn't really been a need for there to be a fundamental advance in horology. The history of technology shows that where there is a clear need, sooner or latter innovation meets that need. Might a clockwork computer be a way of advancing horology fundamentals for the first time in more than a century?

In the next article I'll consider some of the design issues. I'm looking for horological expertise to help draw up a basic design. In fact, I'm even looking for someone who knows how to make a design for a watch, because I certainly don't! If you are interested, do please contact me, dan@shearer.org.

Saturday, 11 August 2018

Radio Waves to Random Number Generator

Random numbers are needed for good cryptography, and good cryptography matters for fundamental human rights reasons. Without it, nothing can be kept private. That is why the EU has built its privacy legislation on human rights. And that is why the random number service at random.org is important, because it suggests (but does not show) how to do this right in a mathematical sense.

And so, in the department of "ancient things found in the attic", here is a clipping from the Adelaide Advertiser in Australia. In 1986 I hadn't the slightest idea how important random numbers were, but they seemed fun at the time. Back then, I just wanted to do better than what a basic IBM PC would produce if you asked it to run a pseudo-random number generator.

Unfortunately no, a random number generator based on mashing together multiple radio stations won't work. Radio waves aren't truly random no matter how many we mash together, and there are mathematical ways to show that. It is an important problem to solve, which takes more maths than I currently have.

Dr Mads Haahr of Dublin has all the right mathematics to assess what is a good source of randomness, and he too looked to the air for his solution, but he chose to use static. "The first version of the random number generator was based on a $10 radio receiver from Radio Shack."

Dr Haahr founded Random.org to produce high-quality random numbers for "holding drawings, lotteries and sweepstakes, to drive online games, for scientific applications and for art and music." The theory behind his work is important for all random numbers. Since the topic of random numbers immediately brings up security, I need to point out that random.org is a single source of failure, and since the source code is not published it is not easily possible to verify Dr Haahr's claims of randomness (it could, for all we know, be a clever fake that slightly weights the random numbers this way or that, to the long-term benefit of whoever did the weighting.)

However - the radio waves did get me $500 at the time without actually doing a thing except writing a letter, and a confusing conversation with a journalist who found the concept very strange indeed...

Thursday, 9 August 2018

The Problem of Sharing Secrets under GDPR Article 28 Mandatory Contracts

Automating the GDPR - The Article 28 Condumdrum

This article shows how the GDPR sets up a conflict in trust between companies in particular circumstances, which can only be resolved by using the automation of a cryptographic audit trail.

Under the EU's GDPR law virtually every company is a Controller, and virtually all Controllers use at least one Processor. When a Processor is engaged, the GDPR requires that a contract with very specific contents is signed. The GDPR requires that Controllers and Processors cooperate together in order to deliver data protection, and this cooperation needs to be very carefully managed to maintain the security and other guarantees that the GDPR also requires.

In other words, the GDPR simultaneously requires strong cooperation and strong security between companies who can't be expected to have common goals or to trust each other. This is difficult to resolve.

About Controllers and Processors

If you are familiar with the European Union's GDPR, and the roles of the Controller and Processor, then you will be aware of the need for a formal agreement (usually a contract) between every Processor a Controller uses.

Effectively, every company is at least a Controller of personal data, certainly if there are employees or customers. Most companies use at least one Processor for data under their control, from an IT support company, to online storage providers, to companies who consultant and outsource in many ways. A contract between a Processor and a Controller is very carefully specified in legal terms in the GDPR, but the technology implications are not mentioned. This is all in the GDPR Article 28.

About Sharing Access to Data

Not sharing data, but access to the data - for example, does an employee of the Controller log on to the computer system in the Processor? And if so how? This is the kind of scenario that puts shivers down the spine of security professionals, yet here it is in the GDPR.

Controllers and Processors could be almost any company using almost any system, so sharing the access to the personal data across organisations just wouldn't work. Personal data is stored be stored in a different way in every organisation - at a minimum, in difference choices from the 200-odd major databases on the market for instance, besides all the non-database systems in use, and the policies and habits unique to every company.

But the same is not true for the secret keys used to get access to this personal data. No matter how diverse the storage mechanism for the personal data, the secret keys are going to be one of a few types. Most often passwords, but also multifactor authentication codes, or cryptographic key files, or one of a small list of other means of authentication.

Article 28 says that these passwords or other keys need to be available for sharing between Controllers and Processors at any time. And yet no company is happy handing out passwords to their internal systems to unknown people, and anyway this could easily become a breach of the GDPR and the forthcoming ePrivacy legislation.

Where Computer Science Comes In

When a Controller engages a Processor, there are many circumstances under the GDPR when these secret keys need to be shared between these parties, parties who should not trust each other. Therefore, without respect to what may happen with the personal data, the handling of the keys to the personal data is of crucial importance. The law requires that you give some degree of access, perhaps a lot of access, to a company whom you have never met and have no reason to trust. Computer Science has given us many ways to think about interacting with people we do not trust, so this is a problem that can be solved.

Article 28 strongly implies that a particular kind of cryptographically guaranteed auditing process is needed for the keys required to access data, when taken with the post-trilogue texts for upcoming laws including the ePrivacy Regulation, the EU Cybersecurity Act and the European Communications Code. The Cybersecurity Act and the EU NIS Directive are urgently pressing standards in these areas, as are the EU-level security and privacy bodies ENISA and the EU Data Protection Board. With all this human rights-based legal pressure, what is needed is a computer science view of how to implement what the law calls for. Article 28(3)(c) says "takes all measures required pursuant to Article 32" so Article 32 is a part of the mandatory contract, and Article 32 is about security, which also implies computer science.

To discover exactly what kind of cryptographic solution will work, we need to look at the information flows the GDPR requires.

GDPR Article 28 Information Flow

A close reading of the mandatory instruments (normally contracts, but not necessarily) in GDPR Article 28 shows that the required flow of information between Controllers and Processors is entirely one way, from the Processor to the Controller. The Processor has to make numerous undertakings and promises to the Controller, stated in a legally binding manner.

In addition there is a lot of mandated potential communication from the Processor to the Controller, meaning that in various circumstances, there will be communication from the Controller to the Processor if the Controller wishes. At any time the Controller can demand the Processor produce information to prove that processing is compliant, or to require the Processor to assist the Controller in certain activities. The Controller is bound by the GDPR to be able to prove at all times that processing is compliant whether or not a Processor has been engaged.

Relationship of the Parties to Article 28 Contracts

Basic security practice is that the parties to such information flows should not trust each other; they are independent entities who in many cases will not have any other dealings. In addition, each are under very strict legal requirements of the GDPR and the (imminent) ePrivacy Regulations, and the (imminent) EU Electronic Communications Code.

Article 28(1) says "the controller shall use only processors providing sufficient guarantees". According to the computer science referred to in this article, it is possible to define a minimum value of "sufficient guarantee" under the GDPR, but even without that analysis, the Controller must seek some guarantees from the Processor and they need to be not only good guarantees but sufficient to back up the rest of Article 28.

This means that parties to Article 28 contracts are required to meet a particular standard, but also that the parties should not trust each other to meet this standard or any other good behaviour.

Article 28 is All About Processors

Article 28 is all about the Processor being bound to the Controller, with the Controller saying and doing nothing outside what is already said in the rest of the GDPR text. The only reference to a Controller in Article 28 is that the contract must "set out the obligations and rights of the controller" (Art 28(3)) which appears to mean effectively stating "Yes I acknowledge I am a Controller and I am acting according to the GDPR".

There are just two references in the entire GDPR requiring the Controller taking action with respect to using a Processor. The first is ensuring that there is a contract in place that complies with the GDPR. The second is in Article 32(4), which says "the controller and processor shall take steps to ensure that any natural person acting under the authority of the controller or the processor who has access to personal data does not process them except on instructions from the controller".

Technical Comments

Article 32 emphasises the phrase "state of the art", an English expression that has caused much confusion. The phrase is only ambiguous within the confines of English, and since the GDPR is authoritative in multiple languages we can easily compare with German and French and see that multiple versions all agree with one of the English meanings. Therefore "State of the art" means "the art as it is practised today in general", as practiced by peers and as defined by standards bodies and the like. It does not mean the best, most efficient or most advanced technology in existence. It does not mean the most recent. This article considers technologies mostly developed decades ago and very widely recommended and deployed today, which definitely are "state of the art".

Technical Analysis About Audit Records. A log file (Unix) or an EventLog (Windows) is not a quality audit record; it has often been held to be a sufficient audit record in courts worldwide, but in that context it is about balance of probabilities and taking into account other log entries created on other systems at the same time - basically a detective hunt by an expert witness. That sort of thing is an audit process but not a good one and typically only ever involves one logging party. The GDPR Article 28 contract requires that there shall be at least two parties to the audit trail whose actions will be logged, which has not been the case in any law previously. The new EU security and privacy laws use the words "appropriate", "should" and "state of the art" so much that I think it is non-controversial that the audit standard required is much higher. There needs to be a cryptographically guaranteed, non-repudiable audit trail for activities where none of the actors involved (including auditors) need to trust each other, and no special expertise or context is required to interpret the record.

Technical Analysis About Keys A key of some sort is always required to get access to personal data, be it a password, passphrases, physical door pinpad code, two factor authentication or whatever else guards the access to the systems with personal data on it. The Article 28 mandated contract specifies that under many circumstances a Controller and a Processor release keys to each other and therefore to natural persons in the employ of each other. By auditing the use of the keys, we are auditing the access to personal data. In order to remain in compliance with Article 32, we can change passwords/keys at any time, reset the list of authorised persons and therefore also resetting the audit trail. A cryptographically secured audit facility can detect the first time that someone accesses a key.

Technical Analysis About the ePrivacy Regulation I have tracked down the different versions presented for Trilogue, which has now finished. ePrivacy following Trilogue appears to include EU Parliament LIBE Committee amendments from October 2017, including Article 26(a) “In order to safeguard the security and integrity of networks and services, the use of end-to-end encryption should be promoted and, where necessary, be mandatory. Member States should not impose... backdoors". If we are having an audit facility for keys to personal data then it should be end-to-end. Like all end-to-end solutions it will upset government spy agencies or any other party that might want to falsify the record through government-imposed backdoors, because such backdoors cannot work according to mathematics.

Technical Analysis About the EU Code of Communications The Code is broader than ePrivacy (which, it can be argued, is limited by its lex specialis relationship to GDPR.) The Code says: "In order to safeguard security of networks and services, and without prejudice to the Member States' powers to ensure the protection of their essential security interests and public security, and to permit the investigation, detection and prosecution of criminal offences, the use of encryption for example, end-to end where appropriate should be promoted and, where necessary, encryption should be mandatory in accordance with the principles of security and privacy by default and design." We know from Snowden and others that the "without prejudice" phrase is just being polite, because there is no technical means to implement "no backdoors end-to-end crypto" and also not make government spy agencies upset.

Minimum Audit Records Required by Article 28

Detail of Required Audit Records, with their basis in law:

Audit records that list of all natural persons who have access to keys to the personal data, and the changes to that list over time:
- Article 28(2) "shall not engage another processor", so everyone can see whether or not an unexpected person was authorised for access to keys
- Article 32(4) "any natural person acting under the authority of the controller or the processor who has access to personal data", so we need an audit log of who *can* have access to keys
- Article 32(4) "any natural person acting under the authority of the controller or the processor ... does not process them except on instructions", so we need an audit log of who actually *did* access the keys at least once.

Audit records for who has accessed the audit records above:
- Article 28(3) "obligations and rights of the controller", shows the controller is watching the processor

These audit records can be technically implement regardless of what IT systems the Controller and the Processor have, because they are only about dealing with the keys. Whoever has the keys has the personal data, and the keys themselves are protected by the GDPR in any case. These audit records are about storing passwords (or other access means.)

Computer Science doesn't seem to allow any way of meeting Article 28 "sufficient guarantee" without a zero-trust encrypted audit model, which these types of audit records enable.

Conclusions

Conclusion 1:: the above minimum audit records are required to fulfill an Article 28 contract between Processor and Controller
Conclusion 2:: if implemented, these records rises to an Article 28(1) "sufficient guarantee" of a Processor being acceptable and therefore the contract being acceptable
Conclusion 3:: there does not seem to be any alternative way of achieving a "sufficient guarantee".
Conclusion 4:: The GDPR requires cryptographic audit facilities to exist and therefore, there is a market for companies to provide these facilities.

Saturday, 30 November 2013

Do Microsoft Patents Cover Your Code? A Useful Process....

Do Microsoft Patents Apply?

A process for checking the patent safety of interoperable network protocols, including but not exclusively in the case of Open Source Software implementations.

Version 1.5
30th November 2013

Copyright Dan Shearer, Carlo Piana

Introduction: The Problem to Solve

Problem Specification: You have a network protocol you have implemented in code, and it is
interoperable with some protocols that Microsoft implement. Now you would like to know if
Microsoft is making any patent claims that read on your code, or conversely, if Microsoft expressly
renounces any posssible claims they may have over the code.

Context: encumbering network protocols with patents has not historically been a winning
commercial strategy. This document specifically covers the case of Microsoft protocols, where the
issue is about Microsoft potentially using patents to protect its monopoly. There have been some
cases that bear on Microsoft's behaviour with respect to patents on some specific network protocols,
but Microsoft's behaviour is not constrained by any court in a general manner.

Scope: This paper only addresses implementation of protocols that Microsoft has documented. Most protocols of commercial significance or general interest that Microsoft implement are documented. The issue is no longer so much how these protocols work so much as what control Microsoft attempts to have over their commercial exploitation. The principle means of control Microsoft has is patents.

Step 0: Get the Facts

Territories. What territories are you interested in? A patent can only be enforced in a territory
where it has been granted, so if you don't care about a territory then it doesn't matter whether a
patent has been granted or not. For practical reasons, any commercially relevant enquiry must
assume that the EU and US are territories that matter, with others added as required.

Protocols. What is the list of protocols that you have implemented, using standard language as
much as possible? Potential points of confusion and things to remember:

• Your opinon as to whether you have implemented a particular Microsoft patent-encumbered protocol may not match that of Microsoft. It is both the mechanisms in use when mediated by a network and the total effect of these mechanisms within the scope of the protocol which is covered.

• The mere fact of incompatibility with a Microsoft protocol is not proof of non-infringement (nor vice-versa.) What does the patent say? How does the Microsoft implementation behave?

• It is usually possible to implement a protocol without infringing on a patent claim, by closely examining the text of the claim and considering what the workarounds are.

• Often in code it is technically convenient to implement two distinctly different but similar Microsoft protocols by sharing code or functions. This implementation convenience does not affect the material fact that more than two protocols have been implemented.

• Microsoft is careful to state that they reserve the right to assert patents over a non-Microsoft protocol implemented by Microsoft, such as a formally defined IETF Standards Track protocol. It may not be the most likely action, but Microsoft do not make any promise not to.

Step 1: Open Specification Promise and Open Exception

This step is about understanding Microsoft's statements about protocols, and seeing what applies to
you. Be aware that Microsoft is using the word “Open” in a deliberately generic manner that has
little bearing on how it is used in other contexts. See http://www.microsoft.com/openspecifications/en/us/programs/osp/default.aspx for the Open
Specification Promise (OSP), which includes a list of “Covered Specifications.”

All protocol specifications listed in the OSP are covered by a definitive non assertion of patent
rights by Microsoft. You need to compare every protocol from your list from Step 0 and see if there
is a match. You will often need to check the text of the protocol description to be sure. If all the
protocols you implemented are covered by the OSP you can stop now.

Unfortunately, of the thousands of Microsoft protocols and formats in use on modern networks only
a fraction are covered by OSP and chances are high that yours will not appear. So nearly everyone
will need to go on to step 2.

The Open Exception comes from the Interoperability Principles, where Microsoft identify a list of
Open Protocols they will only exert patents against if the patents appear on a particular list.

From the Interoperability Principles at
http://www.microsoft.com/openspecifications/en/us/programs/interop/interoperability-principles/def
ault.aspx :
“Microsoft will make available a list of the specific Microsoft patents and patent
applications that cover each protocol. We will make this list available once for each release
of a high-volume product that includes Open Protocols. Microsoft will not assert patents on
any Open Protocol unless those patents appear on that list.”

Emphasis added on the last sentence of this quote, because that is the Open Exception. We know
from various places, including some cited below, that the definition of Open Protocol is the total of
those protocols listed at http://msdn.microsoft.com/en-US/library/cc216514.aspx . The general
comment Microsoft has about protocols is:

"Microsoft publishes technical specifications for protocols in the Windows client operating system (including the .NET Framework), Windows Server, Microsoft Office, SharePoint products and technologies, Microsoft Exchange Server, and Microsoft SQL Server that are used to communicate with other Microsoft software products. These specifications include Windows client operating system and Windows Server protocols covered by the MCPP and WSPP licensing programs created in accordance with the U.S. Consent Decree and the 2004 European Commission Decision."

This does not offer protection for all of these “Open Protocols”, it says that they “include” protocols
covered by the WSPP program, but there are many others besides.

Also seemingly relevant is clause 1, “Open Protocols” of Interoperability Principle I:

“Microsoft commits that all the protocols in its high-volume products that are used by any other Microsoft product will be made openly available to the developer community in a non-discriminatory fashion. These Open Protocols may include protocols that implement industry standards.”

This does not say that Microsoft will not assert its patent claims, this a version of the old RAND
argument, which is known to be against interoperability and certainly against Open Source. So
Principle I Clause 1 does not address the topic of this paper.

For Open Source Software there is clause 4 of Principle I, which states:

"Open Source Compatibility. Microsoft will covenant not to sue open source developers for development and non-commercial distribution of implementations of these Open Protocols."

While this appears to be a genuine covenant, it is very limited and does not address the topic of this
paper. Microsoft is clearly targetting the common case where a group of open source developers
create code which is then exploited by a company (which may employ some of those same
developers.) Microsoft is merely promising not to sue individual developers for implementing code
(such as network protocols) against which Microsoft asserts patent claims. However, it is true,
individual open source developers should not expect to get individually sued for their work on
patent grounds.

Step 2: Microsoft Patent Mapping Tool

See http://www.microsoft.com/openspecifications/en/us/programs/patent-map-tool/default.aspx .
This covers all of the protocols that Microsoft currently thinks may be relevant to patent discussions
under particular Microsoft licensing programmes (including the Open Specification Promise
covered in Step 0) although more protocols may be added at any time and there may be errors.

Again, the total number of protocols here is only a fraction of all protocols in use, and does not
cover all the protocols that Microsoft implements. Therefore it is very likely that the protocol you
have implemented is not listed here.

Nevertheless it is worth a try!

Step 2 Background: How to Understand the Tool

Every way of using this tool gives results in the form of:

The Patent gives a patent number, and especially a territory, such as “US”, or “Japan”. The territory is very important. The most common territory is the US.

Patent Applications disclosed potential patents, which may or may not be granted by the territory
listed. This is useful to determine where Microsoft's patent claims may be going next on a particular
topic.

The word “Programs” is confusing, and refers to a class of application rather than a licensing
programme from Microsoft, although there is some overlap between the two. Examples of
Programs are Exchange Protocols, PC Productivity Applications Protocols, and Windows Server
Protocols.

The algorithm to use when interpreting territories is:

• No patent is listed (ie the tool says “None”). This means there will not be any patent in some
other territory not covered by this tool. “None” is the best possible result.

• Patent listed in both the EU and the US: this is one that Microsoft really cares about, and
probably has taken out in other territories. Keep looking, consult other sources than just
Microsoft's tool or you may get a nasty surprise in a territory that matters (Japan is typically
important for example, and Australia.)

• Patent is only listed in the US: that's relatively good news. There is a fair chance it will not
exist elsewhere, with these two exceptions:

• if there is an application listed in the US then watch carefully, because once it is
granted, others may well follow in other territories, and

• a very recent patent in the US – watch carefully, there is a process called IPC which
gives a period in which Microsoft can apply in the EU.

Step 2a: Classes of Protocol Application

You may have implemented one or more protcols that fit within a particular class of application
recognised by this tool, listed under the section called “Programs”. Click on this and you'll see
classes of protocol grouped together according to the application type and/or the programme
Microsoft wishes to offer them with.

For example, if you are a database developer, then you may choose “Interop – SQL Server
Protocols”. That gives 49 results shown below, and you can see that it is relatively good news.
Microsoft state that they have no patents that read on nearly all these protocols, a very small number
have specific patents listed, and a larger but still fairly small number say “Available on Request”.
If you have identified a specific patent that Microsoft claims under a particular program that covers
a protocol you have implemented you can now go to step 3.

Step 2b: Individual Patent Name

If you have not implemented a protocol from a class that Microsoft have listed, then you must scroll
through the list, reading the title of the protocol. You may find a protocol you have implemented,
because you recognise the name. More often, you must make a guess from the title and then you
need to perform some research to see if it corresponds to a protocol you are interested in.

Be aware: there are likely to be cases where the title given by Microsoft in this tool is not something
you recognise, and since Microsoft do not link to any descriptive text you are guessing. This is
another reason why this tool is not definitive.

If you do your research and decide that, based on a review of protocol names and the protocol text
that you believe corresponds to the protocol name, none of the listed protocols correspond to the
work you have done, then you have exhausted what the Microsoft patent tool can do for you. Now
you can stop.

Now you have a list of patents you can go to step 3.

Step 3: First Pass Examining the Patent Text

From one of the previous steps you have a list of patents that appear to be relevant to your work
implementing protocols. Microsoft has patents which it claims covers protocols you think are the
same or similar to protocols you have implement. Your list consists only of patents that Microsoft
are claiming read on that protocol, and which they say they will enforce, noting the various
exceptions given in steps 1 and 2. You have considered territorial implications, following the logic
in Step 2 Background: How to Understand the Tool.

Congratulations, you know now exactly where Microsoft claims their patents apply to your work.
From this point on you are in a standard patent evaluation scenario to see if Microsoft's claims are
relevant, or if the patent is even valid, or perhaps to re-implement so that the question does not even
arise.

Get a software patent lawyer!

A developer is not a software patent lawyer. A commercial lawyer is not a software patent lawyer,
and neither is a patent lawyer.

Your task is to compare the technical descriptions in the patent with the technical description in
your code, with the technical description in the Microsoft protocol specification. Your task is to
compare three things which might express the same concept in very different ways, or may just
seem to.

Good luck, you'll need it.

Sunday, 8 June 2008

From Chemical Rockets to Open Source

How a young Australian discovered Open Source, a career and that code is a frontier for human rights battles.

Related image

It isn't often I come face to face with myself after a twenty-something year break, but I did yesterday.

As a first year university student at the South Australian Institute of Technology, I wandered into an Adelaide company called Australian Launch Vehicles (ALV), a company I noticed when driving around doing landscape gardening oddjobs. “Launch Vehicles” sounded very cool, so in I went. ALV was founded by a pair of entrepreneurial rocket scientists. Despite decades of rocketry history in South Australia originally thanks to British military ambitions for nuclear weapons, there was no local space industry. Establishing a new Australian spaceflight capability in 1987 was very ambitious.

The founders kindly spent time talking to me, and explained that one of their biggest problems was that ground control software would be hideously expensive. Software? Now I was hooked. That comment had unintended consequences.

Thanks to my parents' foresight and provision I had already been using FidoNet modem-based bulletin board (BBS) networks through my school years. FidoNet was freely-distributed software and I had always been fascinated that you could actually talk to the developers. The logo was a dog with a diskette in the 80s version of emojii.

Image result for fidonet logo

At the South Australian Institute of Technology I discovered the pre-Internet forum technology called Usenet which was larger than the Internet at the time, and still exists today. I found it amazing, all those people doing what we now see as normal Internet activites. I wrote a crude search engine that would crawl for my keywords overnight and send me relevant articles.

I kept noticing the contrast between the software development model used to create Usenet and how software written written in the commercial world worked, where some people working in isolation will sell you floppy disks. But here in 1988 Usenet had downloadable source code, patches and fixes being emailed out so you could keep up to date, and new versions on a daily basis. Just like cloud computing today.

I was so enthused by what collaborative software development could do for spaceflight that I posted this Usenet message worldwide and emailed the same message to the electronic postmasters of every organisation I could find - basically an early piece of spam. It is a strange feeling re-reading my words as a twenty year-old! I was deluged with hundreds of responses, many from seasoned computing and/or aerospace professionals with computing backgrounds. I spent weeks corresponding with people all over the world. Best of all the Institute Computer Centre gave me the rare privileges of disk space and Internet access on my account on the VMS computer cluster. It wasn't their job but I am forever indebted to VMS supremo Rollo Ross for letting me loose.

After a while I decided it really might be possible to write and test rocket launch control software. The director of Research for the Institute and the head of Computer Centre came with me and talked to the rocket scientists. One of them in particular, Peter Winch, suggested an angle you I could tackle. So then I went around the Institute (being completely unused to how academics work, and the way they say things) and put together an alternative project and posted followup, this time with my rare privilege of being able to write to the Usenet forum sci.space: http://tinyurl.com/22qzq7 . My project never had much of a chance, because the main act was Australian Launch Vehicles and after a period of trying gloriously they went out of business.

The whole experience started me off on something new. I had felt the power of a technical discussion where highly competent people treated me as an equal, over a global network. I discovered and wrote tools that let me analyse what people were saying anywhere on Usenet, and discover
who was likely to have similar interests to me. And I learned that global development of source-available software had been going on for decades.

I was particularly interested to see what could be done with collections of this free software, and what it was like to work on internet mailing lists writing it. So I set myself to learn everything I could. Eventually, years later, this kind of software became respectable, and got a name. Open Source Software.

And ALV? All the people have moved on of course but the internet hasn't entirely forgotten. Peter Winch is listed at this space conference in 1990 speaking alongside Buzz Aldrin . After a little investigation I was able to ring him up at an industrial plant... "So, remember when you were a rocket scientist in Adelaide...". We had a great old chat :-)

And now as our rights to a private life and even thoughts are under assault from the ever-more connected digital age, Open Source software with its provable security is one of the few things that can help us. I'm all in.

Thursday, 2 December 1999

How to Replace Windows NT with Linux

When Linux was a Struggling Challenger

Written in 1999, this is as far as I know the first comprehensive document in English about migrating away from Microsoft servers in corporate environments. The intention was to move immediately from "this is a possibility" to "here are some practical ways of doing it" which is why I used the word "replace". Much of the IT-consuming world at the time never even considered the possibility, and while Windows NT was immature in many ways, so Linux too had many significant weak spots. Microsoft was extremely sensitive to documents like this, and I found out years later that this article was subject to specific disinformation in key Microsoft accounts. This article (written before I joined Linuxcare but contributed to them) was very much about using facts to wage a marketing war.

Viewed from later in the 21st century a good deal of it seems quaint and some of it is wrong, but IT systems at the time just were not as sophisticated.

The document was written in Asciidoc, now known as Asciidoctor, and this is the plain-text output. Linuxcare added some graphic design and it was on their site for years, long after the company imploded in the dot-com bubble.

How to Replace Windows NT with Linux
====================================

Dan Shearer (dan@linuxcare.com), Linuxcare

Version 0.3
December 1999

Most IT managers already understand the "why" of Linux and Open Source,
and many are considering adopting Linux. Microsoft Windows NT network
administrators are now facing a forced migration to Windows 2000. For
many, however, a migration to Linux makes more sense. Careful planning
is needed, however, in order to manage such a migration responsibly. How
costly will such a migration be? How difficult? How time consuming?

This paper is about the "how" of Linux, concentrating on the
challenges involved with migrating large and heterogeneous network
environments. (If you need more on the "why" aspect, consult the
papers and case studies at www.unix-vs-nt.org.) Replacing Windows NT
is not always quick and easy (although it can be), but the return on a
sound Linux investment is always worth the effort.

A methodology is presented here which will help you plan a migration
that causes minimal disruption while providing maximum functionality.
Microsoft has attempted to complicate these tasks by closing once-open
technologies ("embrace and extend" is what they call it). All they have
succeeded in doing, however, is providing network managers an even
stronger incentive to adopt Linux. In this paper you will find pointers
to tools that allow truly open standards to be gradually deployed in a
mixed Linux/NT environment, making it a simple step, when the time is
right, to eliminate Windows NT altogether.

1. Linux adoption is nothing to be afraid of
============================================

Those who have been doing Microsoft or PC networking for a few years
have probably experienced many previous migrations. Perhaps you have
migrated your systems from Digital Pathworks or 3Com 3+Share to
IBM/Microsoft LAN Manager, then later from LAN Manager to Windows NT
3.1 (if you were brave) or to Windows NT 3.51 (if you were not).
Later, you possibly migrated to NT 4, and from there to every service
pack. Most NT 4.0 service packs were, in effect, major system
upgrades frequently resulting in unforseen difficulties and requiring
careful testing and planning. If you started from a Netware or Banyan
base and moved to NT, you had equally large headaches. Let's not even
talk about Apricot's idea of networking. If you run Windows NT today
then you are facing the spectre of an expensive and forced migration
to Windows 2000.

Migrating to Linux is a task of equal scale. The need to train
support staff, to test the new solution, to preserve data from
previous sysstems, to transfer user accounts and check access
permissions--all of these are the same. On the other hand, migrating
to Linux is easier in many ways because reliable support is available.
With Linux, "reliable support" means not only being able to get the
help you need to solve your current problems, it also means that you
are empowered to prevent such problems from happening again in the
future.

Perhaps the most attractive thing about a migration to free and open
source software is that the skills you pay to develop are actually a
very solid investment. Every operating system supplier claims this, but
think of it this way -- what is all that expensive Windows NT training
worth now that Windows 2000 is here? And was it you or Microsoft who
decided when those skills would become obsolete? Linux skills remain
applicable for as long as you choose to have software around, and there
is rarely any need to upgrade more than a few components at any one
time.

Windows 2000 forces you to a new directory scheme, a complete
new suite of mail, Internet, and other servers, and also demands
enormous hardware resources. What degree of pain will Windows 3000
impose? In comparison, Linux offers a very attractive migration path.

2. How to migrate
=================

If you are reading this document, you probably already know why you
should migrate to a Linux-based system. It's the "how" of doing such a
migration that can often be overwhelming. Here are some quick tips to
keeping the scope of the task to a manageable scale:

- Don't migrate everything at once. Frequently, the best way to handle
a migration is to phase NT out of the server area first, then to later
concentrate on the workstations. There are, of course, many other ways
to divide the task into more palatable pieces. Some people pick classes
of server applications (such as web, database, file/print) and address
each of these in turn. Others choose to have a policy of maintaining
dual environments on the desktop.

- Avoid application development. It is always tempting to fix obviously
bad programs during a migration. It is far better, however, to have
multiple stages in a migration, between which you can address application
issues. The key here is to avoid trying to do everything at once.

- Linux does more, so use its capacities. Doing a cautious and
well-planned migration doesn't mean that you have to lose functionality.
Linux can do things that are impossible with NT and other systems, and
can also save you both time and money.

- Use fewer, more open, protocols. The larger the number of protocols
you use in your networks, the larger the network management overhead.
While "open" can be difficult to define precisely, you can be fairly
certain that if every part of a protocol is documented and there are
free implementations available, then chances are that it's open. If a
protocol is described in one of the Internet RFC documents, that's
another good indication that the protocol is open.

3. A migration methodology
==========================

There are four steps you can follow to simplify a migration away from
Windows NT. The first three of these steps show you how your data is
currently being accessed, and also how this data can be accessed
differently. The final step provides a Venn diagram illustrating
possible deployment options.

The four steps are:

1. List your most important data stores, including those administered
by users and those administered by network managers.

2. List the various client programs that are currently used to access
these data stores.

3. List the protocols and APIs the client software uses when accessing
these stores.

4. Prepare a "protocol intersection" diagram.

These steps are protocol and API driven, and will allow you to map a
variety of migration paths from the "ideal path" to those which are
restricted by various constraints (such as having to be able to run a
particular Windows application). Once you know all the possible routes
you can take, you will then be better able to select those which are most
appropriate for you and your organization.

3.1. Identify data stores
-------------------------

3.1.1. User-maintained data stores
----------------------------------

Chances are that your users keep lots of data sources up-to-date. Some
of these sources may be located on a workstation or a server, and some
these are likely to be "unofficial". If these data sources stop
working, of course, you will be in trouble. Some examples of these
sources are:

- Email, often one of the most important business resources in a
company. Email archives contain huge amounts of information, and users
have probably put a lot of effort into using features of their mail
clients, such as personal address books and mail filters. In what
formats are these data sources stored? What mail servers and protocols
are in use? What authentication methods are they using?

- Calendaring and scheduling. Mail services are often bundled with
collaborative scheduling systems (such as in a Microsoft Exchange
environment), and these can be among the most challenging systems to
migrate due to the lack of standards for these features.

- File resources. Users often have huge amounts of data stored as
collections of Microsoft Office documents on an NT or Netware server.
Information stored in this fashion is often vital, but can difficult to
search and migrate.

You might consider re-engineering large filestores like these (but not
as part of your migration) Look at the structure of the documents. If
extensive use has been made of Microsoft Office, WordPerfect, or other
such templates, then it is quite likely that the same functionality
can be delivered more reliably and cheaply using a Web forms interface
to a database. In some cases, this can eliminate the need to have an
office suite on the client systems, particularly those used by telephone
sales or customer service staff.

- Databases. On the server side, these include packages such as Oracle
and Microsoft SQL Server, and on the client side, packages such as
Microsoft Access, xBase, and more. The goal is to maintain the same
interface for the users who keep these databases up-to-date, which is
often a service that keeps the company running. A long-term strategy
may be to move the interface to the Web, but in many cases the
short-term answer is to retain the Windows client interface while
re-engineering the protocol/API used to access the database itself.

- Web servers. There are three kinds of web data to consider:

-- Raw content. Web site content maintainers need to know that their
current content editing programs will still be usable after the migration
is complete. This usually means that Windows programs such as FrontPage
and PageMill must continue to work. What information is stored in these
formats? How is this information accessed?

-- Dynamic content. Your Web developers also need to know whether their
NT-specific scripts and applications will change. Often the answer is
"no", or "not much". NT users of PHP and Perl should be almost
completely insulated from changes. Sometimes, when complicated
functionality is required (perhaps because business logic has been
embedded in Microsoft ActiveX objects or other proprietary technologies)
the same functionality can be emulated using standard open technologies.
You will probably be able to split the functionality up and replicate
the majority of it on Linux Web servers. The remaining functions can
stay on NT systems until you have time to replace them with open
solutions.

-- Dynamic content from other sources. Dynamic Web sites often pull
their data from many sources, often in Microsoft-specific ways. List
the data sources being used and the methods being used to access them.

3.1.2. Data stores maintained by network managers
-------------------------------------------------

The following are examples of data that might be maintained by your
network or system administrators, including user and machine
information. You will likely have more and different data sources than
those presented here, particularly if you support many other operating
systems on your network.

- User database. This would include the name and full details for each
network user, and their associated security properties. Windows
NT servers store this information in one SAM database per domain.

- Groups and permissions. This information is also stored in the SAM
database, but is often replicated in supplementary databases because
SAMs have a restricted set of fields relating to groups.

- Computer and network database. Every computer has certain physical
and network properties which need to be maintained in a central data
store of some sort. Windows NT servers don't tend to store this
information at all except through unreliable NetBIOS names and
per-machine SID numbers. Good Windows NT network administrators usually
build custom databases in which they can more reliably store this
machine-specific information, including IP addresses, physical
locations, and other related data.

- Backup archives. These will be maintained in some NT-specific format,
frequently devised by third-party software vendor. The native Microsoft
backup facilities aren't very useful, so this third-party software is
often necessary.

- Server logs. Windows NT access logs are unwieldy, and are rarely
authoritative in a multi-domain environment. If you want to migrate
this functionality, you will be quickly and pleasantly surprised by the
log-management tools that come with Linux.

3.2. List current client software
---------------------------------

While there is a huge range of client software available for Microsoft
Windows workstations, there are actually fewer than 10 suppliers
providing the majority of the applications used in large networks. Bundling
arrangements with a few top-tier suppliers such as Microsoft, SAP,
Lotus, and Oracle means that solving client migration problems with
these vendors' systems usually solves the majority of other client
problems as well.

The lack of drop-in replacements for some client software (especially
Microsoft clients) is not usually a problem. The protocols these
clients use can be catered to by Linux servers, so a multi-stage
migration interspersed with some client re-engineering usually
provides a sufficient solution. In any case, few sites start by
migrating client workstations to Linux immediately in order to delay
training and other human resource issues.

The easiest way to start planning client system migration is to
construct a table, such as the following, which addresses the specific
requirements of your organization.

Microsoft Windows Client Software

Product Purpose Can Use Linux Functional Linux
Servers? Replacements Version?
............. ............. ............. ............. .............
MS Outlook Individual Yes Many, None
Express and Shared including
(several Email, Lotus and
concurrent Scheduling Netscape
versions
exist with
different
feature sets)

Netscape Individual Yes Any Yes
Messenger and Shared Internet-compliant
Email Mail Client

MS FrontPage Publishing Yes Very many None
web pages,
including
image maps
and CGIs

MS Internet Viewing Web Yes Many Not yet, but
Explorer Pages runs on other
Unix
platforms

MS Office Edit Yes Several good No
structured existing and
documents and more
spreadsheets announced

Web-based Organisation-wide Yes Not an issue Yes, any
Customer CRM tasks since it is Linux web
Relationship web-based browser with
Management Javascript
package such as
Netscape or
Opera

In-house MS Maintaining Yes, by Many, None
Access vital several means including
Database database kept Oracle, web
on a file front-ends
server and xBase but
requires
rewriting
client

In-house Maintaining Yes As above; Announced
Oracle Client vital must rewrite
Program database kept client
on Oracle program
Server

Remote Displaying No X Window No
Windows screens from remote
Application Windows NT, application
Display Terminal display
Server
Edition

3.3. List protocols and APIs used by client software
----------------------------------------------------

The following is an example of a list you might create when recording
the protocols relevant to the Microsoft client software used within your
system. Most networks will use most of the protocols shown below, but
there may be a few used on your network that aren't included here. When
you're not sure what protocols are being used by a particular system,
you should use a network sniffer to identify them rather than relying on
the product brochures.

The interesting thing about this list is that nearly all non-standard
Microsoft technology is based either on something that already exists,
or on something that is documented at a lower level. Microsoft's
"embrace and extend" policy is meant to eliminate competition, but it
has also enabled and motivated teams of programmers to unscramble the
Microsoft protocol extensions at roughly the same rate that Microsoft
devises them.

What this means is that while you should make every effort to move
networks entirely to open and standardised protocols that are not
controlled by Microsoft, there are some excellent bridging solutions
available which implement Microsoft's proprietary protocols under Linux.

Not all of the protocols Microsoft uses are proprietary, of course. In
many instances, the non-standard protocols are simply preferred by
Microsoft clients when talking to Microsoft servers. These systems can
often be easily reconfigured to use standard open protocols when
necessary. Outlook Express is a classic example of this, in which IMAP
is supported quite extensively, but the client is unable to connect to
both an IMAP server and a native Exchange server at the same time, even
if the Exchange server is running the IMAP service.

In the following table, "MSRPC" means Microsoft's preferred method of
communicating control data in NT networks: DCE/RPC over Named Pipes
over SMB over NetBIOS. All of these acronyms are explained in the
glossary, although for practical purposes how it works is irrelevant.

Similarly, "MSRDP" means Microsoft's equally complicated way of sending
screen images over a network, such as from Microsoft Windows NT
Terminal Server Edition. This protocol is a proprietary variant of
T.SHARE (ITU T.128), over the Multipoint Communications Service (MCS),
over the ISO Data Transport Protocol, tunneled over TCP.

Protocols Preferred by Microsoft Products

Purpose Preferred Protocol/API Documented?
...................... ....................... ......................
MS Outlook Express MAPI streamed over Encrypted in an undoc
clients to talk to MS MSRPC way
Exchange Server

FrontPage clients to FrontPage Server Undocumented
talk to MS Internet Extensions
Information Server

MS Internet Explorer Extensions to HTTP and Undoc
Clients to talk to MS HTML
IIS

MS Access clients to ODBC streamed over Extended ODBC, TDS
communicate with MS Tabular Data Stream undoc
SQL Server (TDS)

MS clients to talk to Control requests via Undoc requests & undoc
NT Servers for MSRPC encrypt
anything related to
the SAM,
authentication or
administering NT
services

MS File/Print clients SMB (NT clients use Partly doc
to transfer files to MSRPC)
any MS File/Print
server

MS clients to locate NetBIOS Name Server Partly doc
MS server and clients

Transport for previous NetBEUI always Doc, but a dying MS
three protocols preferred when present protocol. A free version has
and possible been released for Linux by
Procon, but it is too early
predict what will happen with it

MS clients to link WINS Mostly doc
NetBIOS names and
Internet names and
addresses

MS clients to access MSRDP Built on existing
remote Windows screens standards with
proprietary extensions

Protocol Equivalents and Implementations

Protocol Free Open Alternative Comments
Implementation
................. ................ ................. ................
MAPI streamed No IMAP mail access The Cyrus mail
over DCE/RPC for protocol and and related
mail stores related standards products suite
at
asg.web.cmu.edu/cyrus
are an excellent
and scalable
replacement for
Microsoft
Exchange

MAPI streamed No ACAP Calendar If you want to
over DCE/RPC for access protocol keep Outlook
calendaring and related Express
standards Calendaring and
Scheduling you
can use HP
Openmail for
Linux,
www.hp.com/openmail

FrontPage Server Yes, by Mrs WebDAV, FPSE is only
Extensions Brisby, www.webdav.org needed with
www.nimh.org/fpse.shtml Microsoft Front
Page

MS Extensions to No Yes Important bits
HTTP and HTML implemented in
browsers on
Linux and
Windows from
Netscape, Opera
and others.
Users won't miss
SMB-in-HTTP

ODBC streamed Yes, Seems to be a Better to use
over TDS www.freetds.org general lack of ODBC over a
standards truly open
transport, eg
odbc.linuxbox.org

NT Control Yes, in Samba SNMP, which has Undoc requests &
requests over numerous Linux undoc encryption
DCE/RPC implementations. - a truly
Also a large horrible
range of web protocol
control tools

MS File/Print Yes, in Samba Partly doc. A
clients to (server) and solved problem
transfer files to smbfs/smbclient
any MS File/Print (clients)
server

MS clients to NetBIOS Name Internet standard Only partly doc, but
locate MS server Server in Samba Resource Location well-implemented
and clients Protocol, or in Samba anyway
alternatively
LDAP

Transport for NetBEUI always Doc, but a As of March 2000 there
previous three preferred when dying MS is a free Linux
protocols present and protocol. Even implementation from
possible Microsoft doesn't www.procom.com,
recommend it.

MS clients to Samba WINS Use DNS instead! Mostly doc
link NetBIOS server
names and
Internet names
and addresses

3.4. Draw a Protocol Intersection Diagram
-----------------------------------------

Using the tables that you have drawn up in the previous steps, you
should be able to list the following (see the Protocols and Software
Reference for more information):

1. The set of protocols/APIs that can be used to make the existing
client software talk to servers (whether currently in use or not).

2. The set of protocols that free server software can use to serve the
existing data stores.

3. The set of protocols free client software can use to access
information from the data stores.

This can be represented in a Venn diagram:

[insert venn1 graphic here]

4. Do it!
=========

Once you understand where your data is and how it can be accessed, you
will be able to draw up a feasible multi-stage migration plan. This
is always highly specific from site to site, but if you follow the tips
given earlier in this paper you will be able to design a staged
migration based upon more open and standardised protocols.

After this point, however, the migration is up to you and will depend
heavily on your knowledge of the network. Which parts of your
infrastructure can be most easily migrated? It may be the file servers
or perhaps the Oracle databases. Are there some performance bottlenecks
that Linux can solve for you? If so, perhaps these are the first areas
you should address.

5. Appendix - Protocol and Software Reference
=============================================

Many of the software packages in this reference run on most kinds of
Unix, as well as on Linux, without modification. Where you see "Unix"
in the following table, you should therefore include "Linux" as well.

The acronyms in this section are explained in the Glossary.

5.1. File Serving
-----------------

Protocol Software
.................................. ..................................
Microsoft:
SMB suite Windows 95, 98, Samba on Unix and
others, print servers, Netapp
filers et al

SMB+NT extensions Windows NT, Windows 2000, Samba on
Unix and others

Novell:
IPX/SPX suite Novell Netware, mars_nwe under
Linux

Unix:
NFS nfsd - standard with any Unix

AFS Andrew Filesystem - free
distributed filesystem for Unix

Coda free distributed filesystem with
mobile synchronisation

FTP Servers available for any
Internet-capable operating system

Apple:
Appleshare Apple file server from Apple,
netatalk for any Unix

The Microsoft model of networking encourages use of file sharing
rather than application sharing. That is to say that every workstation
has a complete copy of an application binary stored locally while data
is stored on servers. This is the most common use for NT servers.
Microsoft Windows Workstations are often also used similarly with
Novell Netware. If this describes your situation, then you would do well
to think about accessing the same data via the Web.

Linux is able to serve files over all of these protocols. If required,
Linux can serve files over all of them simultaneously. Configured
properly, Samba running on Linux is able to perform as an SMB server at
least as well as Windows NT. On large installations (ie on hardware more powerful than anything Windows NT can run on) Samba happily handles hundreds of thousands of simultaneous SMB clients.

Few of these protocols are suitable for general Internet use due to
timing and resource location issues. Currently, there is no
widely-adopted file access protocol with is simultaneously secure, able
to operate between physically distant machines, and easy to integrate
into modern authentication architectures.

5.1.1. Migration Comments
-------------------------

In some cases, a simple redesign of your application structure may allow
you to dispense with file sharing. For example, making data accessible
via the Web rather than through proprietary Microsoft Office files.
Regardless, however, duplicating Windows NT shared file resouces on
Linux is trivial. The challenge lies in getting the authentication
systems right, as discussed below. The PAM authentication system allows
a very flexible migration strategy to be adopted, independent of whether
the authentication database is an NT domain, an NIS domain, an LDAP
server, or a custom SQL database.

5.2. Client-side Filesystems
----------------------------

Protocol Software
.................................. ..................................
Linux:
NFS Standard with Linux (mount -t nfs)

SMB Standard with Linux (mount -t
smbfs)

IPX Standard (mount -t ncpfs) and
enhanced client from Caldera

Coda Standard with Linux (mount -t
coda)

Apple Free add-on to Linux (mount -t
afpfs)

Microsoft:
SMB Comes with Windows 95, 98, NT,
2000

IPX Comes with Windows 95, 98, NT,
2000, not as functional as SMB

NFS Third-party addons, but no really
good ones. Ignore them

Coda Free addon, but not widely known
or tested

While Microsoft has failed to dominate the LAN server market, it has
also successfully avoided including any protocol other than SMB on its
client operating systems. Microsoft has accomplished this by keeping
the development information required to write a successful client
filesystem a proprietary secret, available only if you purchase a
software development kit under non-disclosure terms. Samba, however,
has made it unnecessary to reverse-engineer any of the programming
interfaces involved, because Samba allows almost everything to be done
on the server side.

By locking out serious client-side filesystem competiton, Microsoft
has forced Windows users to forego the advantages of modern
filesystems. Fast, secure, and intelligent distributed filesystems
exist, but Windows users cannot expect to be able to use these any time
soon.

5.2.1. Migration Comments
-------------------------

It is common to keep existing Microsoft Windows clients unchanged during
the first stages of a migration. It is also common to keep using these
clients with traditional file stores, even though it might be better to
use the Web instead (see comments under "File Serving"). If this is the
case in your migration strategy, you should be using SMB. Samba, the
free SMB implementation, is extremely capable and robust, and has a
large and dedicated development team. Microsoft Windows clients are
also better integrated with SMB (and therefore Samba) than they are with
Novell IPX (or mars_nwe). While NFS can be made to work with Windows
clients, it is a very poor and insecure system, and isn't really worth
the effort required to implement it.

Using Samba, it is possible to pass some of the benefits of modern
networked filesystems on to the Windows clients. Pay careful attention
to locking issues, however, when using Samba as a gateway in this
fashion. Read-only access does not present any locking issues (such as
sharing CD ROMs, or sharing a network filesystem via a web server) but
in any read-write situation there is potential for serious locking problems to arise.

5.3. Printing Services
----------------------

Protocol Software
.................................. ..................................
Servers:
lpr Any Unix, Windows NT, Novell, many
others

SMB Samba on Unix and others, Windows
NT

IPX mars_nwe on Linux, Netware,
Windows NT

Clients:
SMB Samba on Unix and others, Windows
95 and Windows 98

lpr Any Unix, Apple, Windows NT
Workstation

IPX Netware clients

The only major platforms that cannot use the Unix lpr printing protocol
natively are 16-bit Windows 95 and Windows 98. Third-party addon
software is available for these operating systems.

5.3.1. Migration Comments
-------------------------

A common solution is to move to using lpr throughout an organisation
except where 16-bit clients are concerned. These 16-bit clients, can be
served from Samba. If each client has to be reconfigured for other
reasons anyway, however, then an lpr solution should be used on 16-bit
Windows systems as well in order to reduce the number of protocols being
used.

When dealing with Windows NT clients, it is just as easy to connect to
printers via lpr as via SMB. This being the case, you may choose lpr in
preference to SMB to avoid an extra layer of complication in your
network. It is sometimes better, however, to send all Windows client
printing through Samba so you are able to later make changes that affect
only the Windows printer users. It is more difficult to isolate the
Windows users if they are all using lpr directly.

5.4. Email Services
-------------------

In the following, "All major client software" means Netscape Messenger,
Microsoft Outlook Express, mutt, pine, Lotus Notes cc:Mail, Pegasus,
Qualcomm Eudora and others of similar sophistication.

Protocol Software
.................................. ..................................
Servers:
SMTP Any mail transport on Unix and
most on Windows NT and other
operating systems. SMTP is more
flexibly implemented on Unix than
any other platform

RFC822 & MIME These mail encoding and formatting
standards are supported by any
Internet-compliant mail transport
and reading software

IMAP Cyrus imapd (free), uWashington
imapd (free), Microsoft Exchange,
many others

POP An ancient but still widely-used
protocol. Useful in organisations
without a well-planned email
strategy, where mail folders tend
to be store on local hard discs
(probably not backed up either!)

MAPI (over MSRPC) Microsoft Exchange, HP Openmail

HTTP Many mail store servers have a web
interface. Microsoft Exchange has
one, as has Lotus Notes and
others. On Unix a component
approach is preferred, and there
are many web interfaces to IMAP
servers available

LDAP Mailing lists, accounts and mail
permissions ought to be stored in
an LDAP database. Exim, qmail,
Sendmail and others on Unix,
Netscape Mail Server on all
platforms, other

Clients:
IMAP All major client software

MAPI Most Windows clients, including
Microsoft, Lotus, Pegasus and
Netscape. No Unix clients because
it is a Windows-only API

SMTP Just about all clients on all
platforms. SMTP is the only
Internet-standard way of
submitting email. There are secure
versions of it. Microsoft Outlook
clients can do SMTP but prefer the
strange MSRPC format where
available

HTML All major client software can
handle messages encoded in HTML,
however plain text is always the
best option for message body text.
If you want a structured document
format enclose it as an attachment
or put it on the web and email the
URL

RTF This Microsoft Word Processing
format is supported natively by
Microsoft Outlook, and by external
viewers in other mailers. It is a
very bad ide to have this enabled
in any context. Disable it.

RFC822 & MIME All major clients software.
However there are many MIME RFCs
to do with internationalisation,
security, large files and more.
Microsoft do not try to provide a
complete implementation, which is
difficult for some Asian and
European languages and anyone who
wants secure email.

LDAP Star Office mail, Netscape
Communicator, Pegasus. Not the
counterpart of LDAP in a mail
server. LDAP on a client should be
used for things like addressbooks.

5.4.1. Migration Comments
-------------------------

Any large deployment of mail servers has to be customised to fit the
site. Commercial software always seems to make this level of
customisation difficult or impossible, and as a result, free software
tends to be much better for the server side of things. On the other
hand, commercial software currently fares better on the client side.
Some commercial clients, such as Mulberry, are outstanding for their
standards compliance. There are, of course, some equally good free
client alternatives.

The most scalable and flexible Linux-based IMAP mail store solution is
the free Cyrus mail server. There are many choices available for the
mail transport component, including Sendmail, Exim, Qmail, and others.
With software like this, along with the SASL authentication mechanism,
the ACAP client configuration protocol, and LDAP, it is possible to
build an extremely powerful enterprise system using only free software
components. The client software can still be Windows or Macintosh
Eudora, Outlook Express, Netscape Communicator, or any of dozens of
other available client systems.

Moving away from Microsoft Exchange is trivial from an email
point of view because Microsoft Outlook Express clients are also capable
of using the IMAP protocol. You can experiment with this by switching
your Exchange server to IMAP-only and changing the configuration of your
Outlook Express clients. Once this works, you can implement a
Linux-based IMAP server without your users ever noticing the
difference.

If you use Lotus Domino or Netscape Mail Server, there have been recent
announcements regarding the availability of this software for Linux
platforms. The simplest route for this part of your migration may be
simply to transfer your existing software license to a Linux version of
the same software when the product is made available.

The calendaring and scheduling functions of Exchange, Domino, and
Netscape Calendar Server are dealt with in the next section.

One of the tricky things about migrating IMAP servers is moving mail
and setting permissions for thousands of mailboxes at a time. One of
the best things to do is use the Perldap library. Sample code has been
posted to Cyrus forums for doing this, including with web interfaces.

5.5. Calendaring and Scheduling
-------------------------------

Calendaring is a strange area. Most products support most of the
standard protocols, but interoperability between clients and servers
from different vendors is still very poor. No calendar access protocol
yet exists, which is mostly because of the intertia behind the
commercial calendaring systems and their proprietary protocols
(Microsoft and Lotus are both major players in the IETF standards
committee). Internet standards in this area have only recently been
finalised, and at this time only free software implements calendaring
that is in compliance with the few standards and standards drafts that
currently exist.

Cybling Systems has a project to attempt to untangle these issues at
http://www.cyblings.on.ca/projects/calendar.

Protocol/format Software
............... ........

Servers
-------
MAPI over a transport MS Exchange, HP Openmail. Not a published standard

Other proprietary Calendar servers with Star Office, Netscape Suite Spot,
Corporate Time and others

Web-based access All major

iCalendar, vCalendar All major, except Microsoft

vCard All major, except Microsoft

SMTP All major

ICAP Anything using the MCAL (Modular Calendar Access Library)
library, such as www.bizchek.com. PHP and GTK+
applications exist. This is not an Internet standard and
the draft has expired

CAL The official direction of the ISO and IETF bodies for
calendaring standards. No product anywhere implements
this Internet draft
LDAP

Clients
-------

MAPI over a transport MS Schedule+, MS Outlook Express

Other proprietary All clients, due to lack of calendar access standard

vCard Most major

SMTP Some minor

LDAP Netscape, StarOffice, other minor

The paper at
http://www-me1.netscape.com/calendar/v3.5/whitepaper/index.html
summarises a vendor's view of Internet calendaring standards (provided
the vendor is not one of the two who have millions of existing
proprietary clients and the ability to stall the standards process!)

The best that any calendar software implementor can do at the moment
is implement the following protocols: iCalendar, vCalendar, vCard,
SMTP (for e-mail notification), LDAP (for details of all users, groups
and items that can be scheduled) and X.500 (in very large corporate
environments). This will change as soon as the ICAP Calendar Access
Protocol or its equivalent becomes an Internet standard.

5.5.1. Migration Comments
-------------------------

If at all possible, you should use a Web-based calendar client with a
server that supports as many Internet standards as possible. If you
must use Microsoft Outlook Express, then HP Openmail is the only
non-Microsoft option available. The calendar servers from the Star
Office and Netscape Suite Spot server suites can provide good interim
solutions in many situations. The Corporate Time calendaring product is
an example of a calendaring system that uses all the available standards
(see http://www.cst.ca). There are other examples, but for the moment the
area is fraught with difficulty.

5.6. Web Servers
----------------

The Apache Web server is free software that is currently used on over
55% of Web sites on the Internet, with Microsoft IIS being used on 24%.
Reliable data is hard to find for Intranet deployments, but it seems
likely that Microsoft is being used on a higher percentage of Intranet
servers.

Web publishing is best done using the standard WebDAV protocol
(http://www.webdav.org/other/faq.html), but the widely-used Microsoft
FrontPage packages use the undocumented Front Page Server Extensions
protocol. Both of these are implemented on Linux.

Protocol/API Software
............ ........

HTTP v 1.1 All major

ISAPI prog. interface All major

ASP scripting IIS, Apache

PHP scripting Apache, IIS others

Data interfaces eg ODBC Apache, IIS, others

Front Page Extensions Apache (via http://www.nimh.org/fpse.shtml)

WebDAV (open FPSE) Apache, Netscape

5.6.1. Migration Comments
-------------------------

Microsoft is not dominant in the Web server market, so there are not
nearly as many difficulties in migrating to a non-NT system.
Administrators should find Apache easier to configure and run for large
and mission-critical sites.

One of the big issues involved when migrating away from Microsoft IIS
servers is the use of Active Server Pages (ASP). If the language used
for ASP is Perl rather than Visual Basic then there should be minimal
difficulties (see http://www.on-luebeck.de/doku/asp/).

A migration to Linux usually includes replacing IIS with Apache. You
can start this aspect of the migration by running Apache on a Windows NT
server if there are OS-specific integration issues that require more
time to solve.

There are other free Web server solutions available for Linux,
including Roxen (http://www.roxen.com), that have particular strengths
used for electronic commerce and in a couple of other specific areas.
Zeus (http://www.zeus.co.uk) is a commercial Web server available for
Linux which is quickly increasing its market share (see the surveys
available at http://www.netcraft.co.uk).

5.7. Database Servers
---------------------

The Linux database server market is booming. Microsoft is currently the
only major vendor who has not produced a closed-source Linux version of
their database offering. PostgreSQL and MySQL are the leading free
software contenders.

Most database servers are accessible via the ODBC API which packages SQL
calls. Differences arise as to how ODBC calls are transported, which is
where ODBC "bridges" come in to the picture. ODBC bridges obviate the
need for common protocols, albeit in a rather clumsy fashion.

There isn't much to discuss in the way of protocols, except that Sybase
and Microsoft SQLServer use a partially-undocumented Sybase protocol
called TDS when communicating ODBC queries. Microsoft has extended this
protocol in even more undocumented ways, but a free implementation does
exist (see http:/www.freetds.org). This is important only because
Microsoft Access uses TDS by default when communicating with SQLServer.

5.7.1. Migration Comments
-------------------------

If you can eliminate TDS from your network, you will reduce the overall
complexity of your database system.

If you have an NT data source that you want to be able to access from
Linux, the ODBC Socket Server (http://odbc.linuxbox.com) will allow you
to do this.

Note that it is important to get the Primary Key Definition right when
making ODBC calls to non-Microsoft databases from Microsoft Access.

5.8. Firewalls, Gateways, DNS and other Basic Internet Services
---------------------------------------------------------------

This is one area where Microsoft has made relatively little headway in
corrupting Internet standards. Microsoft has produced variants of DHCP,
PPP, and numerous other "glue" protocols, but remains a minor player
in the network management layer. As such, Microsoft is unable to
influence the market at the expense of open Internet standards.

If you are running any of these services on a Windows NT machine, then
you are putting yourself at risk. Windows NT simply is not able to
provide any verifiable degree of security when operating as a firewall
due to Microsoft refusing to allow peer review of their code. For the
same reason, even if the Microsoft DNS server wasn't already famous for
being unreliable, there have been enough security holes identified in
the free open-source DNS server implementation to warn anyone away from
relying on a very young, closed-source implementation.

5.9. Things Not Covered in this Paper
-------------------------------------

o Windows source code migration. If you are fortunate enough to have
the Windows source code to applications that you wish to run on
Linux then there is a great deal that can be done to make this
as simple as possible without requiring a code rewrite. This will
be the subject of another Linuxcare paper!

o Authentication systems, Linux PAM and mixed authentication
environments. With a combination of PAM on Linux and Unix systems and
LDAP as the master authentication database it is possible to authenticate
against every likely protocols. Samba can authenticate Windows clients
using PAM to talk to LDAP, RADIUS dialup authentication servers can do
the same, as can any other service which runs on Linux. There is also
an LDAP schema which supplies all required NIS+ information so that
LDAP becomes a true distributed directory service. This is a whole
paper on its own!

o The Service Location Protocol (RFC2608). This is for locating
services of any kind on an Intranet, with defined mappings to
LDAP and other standard repositories.

o Database application migration to free Linux databases. Recent work
by the Postgresql team means that Postgresql can now deliver all the
functionality of large commercial databases such as Oracle.

o Email address book formats and access mechanisms, especially
relating to ACAP and LDAP.

o Extent of the Calendaring and Scheduling protocol mess, and recent
positive signs.

6. Glossary of Terms and Acronyms
=================================

ACAP
- Application Configuration Access Protocol, a protocol being developed
by the IETF. ACAP supports IMAP4-related services.
http://asg.web.cmu.edu/acap/

AFS
- Andrew File System, an old but innovative distributed
filesystem. See the FAQ at
http://www.angelfire.com/hi/plutonic/afs-faq.html. Modern
replacements exist, such as Intermezzo by Linuxcare employee Phil
Schwan.

Apache
- An Open Source Web server developed by the Apache Group, a large
group of open source developers from many companies including
Linuxcare (Martin Poole and Rasmus Lerdorf.) According to recent
surveys, it is estimated that Apache is used on approximately 58% of
servers on the Web. You can get more information about Apache and the
Apache Group at http://www.apache.org.

API
- Application Program(ming) Interface, a set of routines, protocols, and
tools for developing software applications.

ASP
- Active Server Pages, a Microsoft specification for creating
dynamically-generated Web pages that utilizes Microsoft Active X components,
usually via Microsoft VBScript or Perl.

CAP
- Calendar Access Protocol
Internet Draft draft-ietf-calsch-cap. See http://www.imc.org/ids.html#calsch.

CGI
- Common Gateway Interface, a specification for transferring data
between a Web server and a CGI program. A CGI program is any program
designed to accept and return data that conforms to the CGI
specification. CGI programs can be written in any number of programming
languages, including C, Perl, or Java. For more information about CGI,
see http://www.w3.org/CGI/.

Coda
- A free distributed filesystem intended to solve the problem of disconnected
filesystems (eg wandering laptops.) Replaced by Intermezzo.

Corporate Time
- Example of commercial corporate scheduling packages that tries to be
as standards-compliant as possible. http://www.cst.ca/

CRM
- Customer Resources Management. A buzz-word for software that manages a
database of all information to do with potential, existing and past customers.

Cyrus mail server
- An extremely robust and scaleable free email storage server. Tends
to cooperate with the leading implementation of new standards
including SASL, ACAP and Sieve.

DCE
- Distributed Computing Environment, a suite of technology services
developed by The Open Group (http://www.opengroup.org) for creating
distributed applications that run on different platforms.

DHCP
- Dynamic Host Configuration Protocol, a protocol for assigning dynamic
IP addresses to devices on a network. For more information see RFC1531
(ftp://ftp.isi.edu/in-notes/rfc1531.txt).

DNS
- Domain Name Service, an Internet service that translates domain names
into IP addresses.

Eudora
- A popular commercial, closed source email client developed by
Qualcomm, Inc. For more information see http://www.eudora.com.

Exim
- One of the leading free mail transport programs. http://www.exim.org.

FPSE
- Front Page Server Extensions, an undocumented method invented by
Microsoft for having web publishing software write to a web server.
Completely replaced by the Internet DAV standard.

FTP
- File Transfer Protocol, a standard internet protocol used for
sending files. For more information, see RFC959
(ftp://ftp.isi.edu/in-notes/rfc959.txt). FTP is still the only
Internet-wide file-specific transfer protocol, after more than 20
years.

GTK+
- Gimp ToolKit, a small and efficient widget set for building graphical
applications.

HP OpenMail
- Hewlett Packard's answer to Microsoft exchange. By simply replacing
the file MAPI.DLL on the client workstations OpenMail can be used as
a server for Microsoft Outlook Express clients including calendaring
and scheduling. The replacement MAPI.DLL does not communicate with
the OpenMail server using MSRPC. http://openmail.hp.com

HTML
- Hypertext Markup Language, the main language used to create documents
on the Web. For more information see http://www.w3.org/MarkUp/.

HTTP
- Hypertext Transfer Protocol, the underlying protocol used on the Web,
defining how messages are formatted and transmitted, and how servers and
browsers should respond to various commands. For more information see
RFC2616 (ftp://ftp.isi.edu/in-notes/rfc2616.txt).

iCAL
- Internet calendar formal public identifier.
http://www.imc.org/draft-ietf-calsch-icalfpi See
http://www.imc.org/ids.html#calsch

iCalendar
- see iCAL

ICAP
- Internet Calendar Access Protocol. See the www.imc.org url above

IETF
- Internet Engineering Task Force http://www.ietf.org

IIS
- Internet Information Server, Microsoft's closed-source Web server
that runs on Windows NT. According to the latest figures from Netcraft
http://www.netcraft.co.uk, IIS' market share is dropping each month.

IMAP
- Internet Message Access Protocol, a protocol used for retrieving email
messages. For more information see RFC2060
(ftp://ftp.isi.edu/in-notes/rfc2060.txt).

imapd
- Generic name for a daemon, or server process, use to handle IMAP
connections.

Intermezzo
- A distributed file system with a focus on high availability. The
principal developer is Phil Schwan, from Linuxcare. For more
information see http://www.inter-mezzo.org.

IPX
- Internetwork Packet Exchange, an undocumented and closed-source
networking protocol used by Novell Netware operating systems.

ISAPI
- Internet Server Application Program Interface, an API developed by
Microsoft for it's IIS Web server. Some other Web servers support
ISAPI.

ISO
- International Organization for Standardization, an organization
composed of national standards bodies from over 75 countries. For
more information about the ISO, see
http://www.iso.ch/welcome.html. ISO standard typically take years
longer to develop than Internet standards. The ISO standards for
computer protocols were completely superseded by Internet standards.

ITU
- International Telecommunication Union, an intergovernmental
organization through which public and private organization develop
telecommunications systems. The ITU is a United Nations agency
responsible for adopting international treaties, regulations, and
standards governing telecommunciations. For more information about the
ITU, see http://www.itu.int/.

ITU T.128
- T.128 is the Internation Telecommunication Union's recommendation
regarding Multipoint Application Sharing. For more information, see
http://www.itu.int/. No open source implementations and closed-source
implementations do not have a good record for interoperability. Use
the X Window system instead!

LAN
- Local Area Network, a computer network that spans a relatively small
physical area.

LDAP
- Lightweight Directory Access Protocol, a set of protocols devised for
accessing information directories. LDAP is based on the standards
contained within the X.500 standard, but is significantly simpler.
LDAP supports TCP/IP, which is necessary for any type of Internet
access. For more information, see RFC2251, RFC2252, RFC2253, and RFC2589.

lpr
- The Unix Line PRinter protocol. Ubiquitous protocol for transferring
print jobs around a network.

MAPI
- Message Application Programming Interface, a system that enables
Microsoft Windows' email applications to communicate for distributing
mail. This API is only relevant to Windows machines.

mars_nwe
- Open source clone of the most functional parts of Novell Netware,
usually run on Linux.

MCAL
- Modular Calendar Access Library. http://mcal.chek.com.

MIME
- Multipurpose Internet Mail Extensions, a specification for formatting
non-ASCII messages so they can be sent over the Internet. Many email
clients support MIME, enabling them to send and receive graphics, audio,
video, and other different file types. There are many, many MIME-related
RFCs (see http://www.imc.org for more information.)

MSRPC
- Microsoft's preferred method of communicating control data in NT
networks: DCE/RPC over Named Pipes over SMB over NetBIOS. The only
open implementation of this is by Luke Leighton of Linuxcare, whose
work can be seen in Samba and is explained in his book "Samba and
Windows NT Domain Internals" available from MacMillan Technical
Publishing.

Mulberry
- Mulberry is a closed-source email client for Microsoft Windows or
Apple Macintosh platforms with a Linux version in beta as of January
2000. For more information see
http://www.cyrusoft.com/mulberry/mulbinfo.html. Mulberry is remarkable
for its excellent implementation of Internet standards, including new
ones such as ACAP. In contrast, applications such as Microsoft Outlook
Express and Netscape Communicator frequently implement standards
poorly, making more work for administrators and in some cases
penalising the end-user.

MySQL
- MySQL is a multi-user, multi-threaded SQL database server. MySQL is a
client/server implementation that consists of a server daemon "mysqld"
and many different client programs and libraries. For more information,
see http://www.mysql.org. MySQL and Postgresql between them are the most
popular open source databases. MySQL is the lighterweight of the two.

NetBEUI
- NetBIOS Enhanced User Interface, an enhanced version of the NetBIOS
protocol used by network operating systems such as LAN Manager, LAN
Server, Windows for Workgroups, Windows 95/98, and Windows
NT. Documentation is now available but most regard it as a dead
protocol. However it is the best SMB transport protocol for the
millions of DOS machines still in use and free closed-source NetBEUI
stacks for DOS are available for download from IBM and Microsoft.
A free Linux version ready for use with Samba was made available in March
2000 at www.procom.com as this paper was being completed.

NetBIOS
- Network Basic Input Output System, an application programming
interface that augments the DOS BIOS by adding special functions for
local area networks. NetBIOS over TCP/IP is defined in RFC1001 and RFC1002.
This is a very poor protocol, implemented in several open source products
including Samba (www.samba.org) and derivatives.

Netcraft
- Netcraft is an internet consultancy based in Bath, England. The
majority of its work is closely related to the development of internet
services. Netcraft is most famous for its website which is devoted to
surveying Internet technologies. For more information, see
http://www.netcraft.com.

NFS
- Network File System, an open system designed by Sun that allows all
network users to access shared files stored on different platforms. NSF
provides access to shared files through the Virtual File System that
runs via TCP/IP. NFS is demonstrably a poor choice for running on
Windows-based PCs, due to the bad design of Windows.

nfsd
- Generic name for a daemon, or server process, use to handle Network
File System connections. Think of it as the Samba equivalent for the
NFS protocol.

NIS
- Network Information Server, a Unix directory system for distributing
system configuration data such as user and host names between computers
on a network. Can be linked to an LDAP database transparently to the
client systems, see www.padl.com.

ODBC
- Open DataBase Connectivity, a database access method developed by
Microsoft and widely implemented. ODBC is an API not a protocol.

PAM
- Pluggable Authentication Modules, a general infrastructure for
module-based authentication. For more information, see the Linux-PAM
pages at http://www.kernel.org/pub/linux/libs/pam/.

Pegasus
- A very popular closed-source email client for Windows and Macintosh
platforms, available free of charge from New Zealand-based Pegasus
Computing. For more information, see http://www.pegasus.usa.com/.

Perl
- Practical Extraction and Report Language, a programming language
originally developed by Larry Wall, now maintained by an extensive team
of Open Source developers. Perl is one of the most popular languages
for writing CGI scripts. For more information, see http://www.perl.org.

perldap library
- PerLDAP, or Perl-LDAP, is a combination of an interface to the C SDK
API and a set of object oriented Perl classes. For more information,
see http://www.mozilla.org/directory/faq/perldap-faq.html.

PHP
- PHP Hypertext Preprocessor, a web scripting language that is an
alternative to Microsoft's Active Server Pages (ASP). PHP runs on
Linux, Windows, and many other platforms. The principal author is
Rasmus Lerdorf of Linuxcare. For more information, see
http://www.php.net.

POP
- Post Office Protocol, a protocol used to retrieve email from a mail
server. Most email clients support this protocol. For more information
see RFC1939 (ftp://ftp.isi.edu/in-notes/rfc1939.txt).

PostgreSQL
- PostgreSQL is a object-relational database management system
supporting almost all SQL constructs. For more information, see
http://www.postgresql.org. See also MySQL.

PPP
- Point-to-Point Protocol, a method for connecting a computer to the
Internet. For more information see RFC1661
(ftp://ftp.isi.edu/in-notes/rfc1661.txt).

qmail
- Like Exim, Qmail is an open source replacement for sendmail, written by Dan
Bernstein. For more information, see http://cr.yp.to/qmail.html.

RFC
- Request For Comments. For more information, see
http://www.rfc-editor.org.

RFC822
- Standard for ARPA Internet Text Messages (Aug 13, 1982). This defines the
basic format of Internet email messages, for example, it says that every
message should have a Subject: and Date: header.

RPC
- Remote Procedure Calls, a protocol that allows for a program on one
computer to execute a program on a server. Using RPC, a system
developer does not need to develop specific procedures for the
server--the client program sends a message to the server, and the server
returns the results of the executed program. For more information, see
RFC1831 (ftp://ftp.isi.edu/in-notes/rfc1831.txt).

Roxen
- Roxen is a line of Internet server products, the core of which is
the Roxen Challenger Web server. Roxen is free software distributed
under the GNU General Public License and is distributed with a
robust IMAP module. For more information, see
http://www.roxen.com.

RTF
- Rich Text Format, a Microsoft-devised method for formatting documents.
The specifications are available but very complex. Fine details of
documents (such as table alignment) are often confused in translations.
Use XML instead wherever possible.

SAM
- The Windows NT Security Account Manager. A database of undocumented
format which stores usernames, passwords and other information equivalent
to a NIS or LDAP database in the free world. A SAM access tool has been
produced by the Samba team which extracts usernames and passwords from
the SAM for the purposes of migrating away from NT to Samba.

Samba
- Samba is an open source software suite that provides file and print
services to SMB (otherwise known as CIFS) clients. The principal
author is Andrew Tridgell of Linuxcare who is now assisted by a
multinational team of open source developers. Samba is the only SMB
server apart from Windows NT that has large market share. Samba is
freely available under the GNU General Public License. For more
information, see http://www.samba.org.

SAP
- The US brach of SAP AG, the second-largest software company in the
world, based in Germany. Their closed-source Enterprise Resource
Planning package is very popular, and runs on Linux.

SASL authentication
- Single ASsignment Language, a functional programming language designed
by Professor David Turner in 1976.

Sendmail
- Sendmail is an open source Mail Transfer Agent distributed under
the Sendmail License. For more information, see http://www.sendmail.org.
Sendmail is an ancient program responsible for delivering perhaps 70% of
all email on the Internet. Modern replacements include Exim and qmail (q.v.)

SID
-Windows NT Security IDentifier.

SMB
- Server Message Block, a message format used by DOS and Windows
operating systems to share file, directories, and services. A number of
products exist that allow non-Microsoft systems to use SMB. Samba is
such a system, enabling Unix and Linux systems to communicate with
Windows machines and other clients to share directories and files. The
SMB protocol is undocumented and has many bad design features. It is
effectively monopolised by Microsoft, although there is a public
CIFS group.

SMTP
- Simple Mail Transfer Protocol, the Internet protocol used for sending email
messages between servers. SMTP is generally used to send mail from a
client to a server. This is the most important protocol on the Internet.

SNMP
- Simple Network Management Protocol, a set of protocols used for
managing complex networks. SNMP works by sending "protocol data units"
(PDUs) to different parts of the network where SNMP-compliant "agents"
store data about themselves in "Management Information Bases" (MIBs).

SPX
- Sequenced Packet Exchange, an undocumented transport layer protocol
used in Novell Netware networks. SPX sits on top of the IPX layer and
provides connection-oriented services between two nodes on the
network. Like IPX and SMB (q.v.) this protocol should be avoided wherever
possible however there are open source implementations.

SQL
- Structured Query Language, a standardized query language for
requesting information from a database.

Star Office
- Star Office is a suite of office applications, freely available through Sun
Microsystems. For more information, see http://www.sun.com/staroffice/. All
support for Star Office is free, and handled by Linuxcare.

Sybase
- One of the dominant software companies in the area of database
management systems and client/server programming environments. Microsoft
SQL Server is based on Sybase, which is why Sybase and SQLServer both use
the undocumented TDS protocol. www.freetds.org.

TCP
- Transmission Control Protocol, one of the main protocols used in
TCP/IP networks. TCP enables two hosts to establish a connection and
exchange streams of data, guaranteeing the delivery of the packets in
the correct order.

TCP/IP
- Transmission Control Protocol/Internet Protocol, a suite of
communications protocols used to enable communication between
computers. TCP/IP is the defacto standard for transmitting data over
networks.

TDS
- Tabular DataStream, a protocol used by Sybase and Microsoft for
client to database server communications. A free implementation of TDS
is being developed (http://www.freetds.org).

URL
- Uniform Resource Locator, the global address of resources available
via the Web.

WebDAV
- WebDAV is a protocol that defines the HTTP extensions necessary to
enable distributed web authoring tools to be broadly interoperable while
supporting the users needs. In this respect, DAV is completing the
original vision of the Web as a writable, collaborative medium. For
more information, see http://www.webdav.org.

WINS
- Windows Internet Naming server, a name resolution system that
determines the IP address that is associated with a particular network
computer. WINS is a non-open alternative to DNS.

X.500
- An ISO and ITU standard that defines how global directories should be
structured. X.500 directories are hierarchical with different levels
for each category of information.

Zeus
- Zeus is a scalable Web server produced by Zeus Technologies. For more
information see http://www.zeus.com.