Software engineering: 2010

Tuesday, January 19, 2010

Software engineering

History

Main article: History of software engineering

When the first modern digital computers appeared in the early 1940s, the instructions to make them operate were wired into the machine. Practitioners quickly realized that this design was not flexible and came up with the "stored program architecture" or von Neumann architecture. Thus the first division between "hardware" and "software" began with abstraction being used to deal with the complexity of computing.

Programming languages started to appear in the 1950s and this was also another major step in abstraction. Major languages such as Fortran, ALGOL, and Cobol were released in the late 1950s to deal with scientific, algorithmic, and business problems respectively. E.W. Dijkstra wrote his seminal paper, "Go To Statement Considered Harmful", in 1968 and David Parnas introduced the key concept of modularity and information hiding in 1972 to help programmers deal with the ever increasing complexity of software systems. A software system for managing the hardware called an operating system was also introduced, most notably by Unix in 1969. In 1967, the Simula language introduced the object-oriented programming paradigm.

These advances in software were met with more advances in computer hardware. In the mid 1970s, the microcomputer was introduced, making it economical for hobbyists to obtain a computer and write software for it. This in turn lead to the now famous Personal Computer or PC and Microsoft Windows. The Software Development Life Cycle or SDLC was also starting to appear as a consensus for centralized construction of software in the mid 1980s. The late 1970s and early 1980s saw the introduction of several new Simula-inspired object-oriented programming languages, including C++, Smalltalk, and Objective C.

Open-source software started to appear in the early 90s in the form of Linux and other software introducing the "bazaar" or decentralized style of constructing software. Then the Internet and World Wide Web hit in the mid 90s changing the engineering of software once again. Distributed Systems gained sway as a way to design systems and the Java programming language was introduced as another step in abstraction having its own virtual machine. Programmers collaborated and wrote the Agile Manifesto that favored more light weight processes to create cheaper and more timely software.

The current definition of software engineering is still being debated by practitioners today as they struggle to come up with ways to produce software that is "cheaper, bigger, quicker".

Profession

Main article: Software engineer

While some areas, such as Alberta, Ontario, and Quebec, Canada, license software engineers, most places in the world have no laws regarding the profession of software engineers. Yet there are some guides from the IEEE Computer Society and the ACM, the two main professional organizations of software engineering. The IEEE's Guide to the Software Engineering Body of Knowledge - 2004 Version or SWEBOK defines the field and gives a coverage of the knowledge practicing software engineers should have. There is also an IEEE "Software Engineering Code of Ethics".In addition, there is a Software and Systems Engineering Vocabulary (SEVOCAB),published on-line by the IEEE Computer Society.

In the UK, the British Computer Society licenses software engineers and members of the society can also become Chartered Engineers (CEng), while in Canada, software engineers can hold the Professional Engineer (P.Eng)designation and/or the Information Systems Professional (I.S.P.) designation; however, there is no legal requirement to have these qualifications.

Employment

In 2004, the U. S. Bureau of Labor Statistics counted 760,840 software engineers holding jobs in the U.S.; in the same time period there were some 1.4 million practitioners employed in the U.S. in all other engineering disciplines combined. Due to its relative newness as a field of study, formal education in software engineering is often taught as part of a computer science curriculum, and as a result most software engineers hold computer science degrees.

Most software engineers work as employees or contractors. Software engineers work with businesses, government agencies (civilian or military), and non-profit organizations. Some software engineers work for themselves as freelancers. Some organizations have specialists to perform each of the tasks in the software development process. Other organizations require software engineers to do many or all of them. In large projects, people may specialize in only one role. In small projects, people may fill several or all roles at the same time. Specializations include: in industry (analysts, architects, developers, testers, technical support, managers) and in academia (educators, researchers).

There is considerable debate over the future employment prospects for software engineers and other IT professionals. For example, an online futures market called the "ITJOBS Future of IT Jobs in America" attempts to answer whether there will be more IT jobs, including software engineers, in 2012 than there were in 2002.

Certification

Professional certification of software engineers is a contentious issue. Some see it as a tool to improve professional practice; "The only purpose of licensing software engineers is to protect the public".

The ACM had a professional certification program in the early 1980s,^{[citation needed]} which was discontinued due to lack of interest. The ACM examined the possibility of professional certification of software engineers in the late 1990s, but eventually decided that such certification was inappropriate for the professional industrial practice of software engineering. As of 2006^[update], the IEEE had certified over 575 software professionals. In the U.K. the British Computer Society has developed a legally recognized professional certification called Chartered IT Professional (CITP), available to fully qualified Members (MBCS). In Canada the Canadian Information Processing Society has developed a legally recognized professional certification called Information Systems Professional (ISP)^[20]. The Software Engineering Institute offers certification on specific topic such as Security, Process improvement and Software architecture^[21].

Most certification programs in the IT industry are oriented toward specific technologies, and are managed by the vendors of these technologies. These certification programs are tailored to the institutions that would employ people who use these technologies.

Impact of globalization

Many students in the developed world have avoided degrees related to software engineering because of the fear of offshore outsourcing (importing software products or services from other countries) and of being displaced by foreign visa workers. Although government statistics do not currently show a threat to software engineering itself; a related career, computer programming does appear to have been affected. Often one is expected to start out as a computer programmer before being promoted to software engineer. Thus, the career path to software engineering may be rough, especially during recessions.

Some career counselors suggest a student also focus on "people skills" and business skills rather than purely technical skills because such "soft skills" are allegedly more difficult to offshore.It is the quasi-management aspects of software engineering that appear to be what has kept it from being impacted by globalization.

Education

A knowledge of programming is the main pre-requisite to becoming a software engineer, but it is not sufficient. Many software engineers have degrees in Computer Science due to the lack of software engineering programs in higher education. However, this has started to change with the introduction of new software engineering degrees, especially in post-graduate education. A standard international curriculum for undergraduate software engineering degrees was defined by the CCSE.

Steve McConnell opines that because most universities teach computer science rather than software engineering, there is a shortage of true software engineers In 2004 the IEEE Computer Society produced the SWEBOK, which has become an ISO standard describing the body of knowledge covered by a software engineer^{[citation needed]}.

The European Commission within the Erasmus Mundus Programme offers a European master degree called European Master on Software Engineering for students from Europe and also outside Europe. This is a joint program (double degree) involving 4 universities in Europe.

Sub-disciplines

Software engineering can be divided into ten subdisciplines. They are:

Software requirements: The elicitation, analysis, specification, and validation of requirements for software.
Software design: The design of software is usually done with Computer-Aided Software Engineering (CASE) tools and use standards for the format, such as the Unified Modeling Language (UML).
Software development: The construction of software through the use of programming languages.
Software testing
Software maintenance: Software systems often have problems and need enhancements for a long time after they are first completed. This subfield deals with those problems.
Software configuration management: Since software systems are very complex, their configuration (such as versioning and source control) have to be managed in a standardized and structured method.
Software engineering management: The management of software systems borrows heavily from project management, but there are nuances encountered in software not seen in other management disciplines.
Software development process: The process of building software is hotly debated among practitioners with the main paradigms being agile or waterfall.
Software engineering tools, see Computer Aided Software Engineering
Software quality

Related disciplines

Software engineering is related to the disciplines of computer science, project management, and systems engineering.^[30]^[31]

Computer science

Software engineering is considered a subfield of computer science by many academics. Many of the foundations of software engineering come from computer science.

Project management

The building of a software system is usually considered a project and the management of it borrows many principles from the field of Project management.

Systems engineering

Systems engineers have been dealing with the complexity of large systems for many decades and their knowledge is applied to many software engineering problems.

Hardware

Main article: Personal computer hardware

The term hardware covers all of those parts of a computer that are tangible objects. Circuits, displays, power supplies, cables, keyboards, printers and mice are all hardware.

History of computing hardware
First Generation (Mechanical/Electromechanical)	Calculators	Antikythera mechanism, Difference engine, Norden bombsight
First Generation (Mechanical/Electromechanical)	Programmable Devices	Jacquard loom, Analytical engine, Harvard Mark I, Z3
Second Generation (Vacuum Tubes)	Calculators	Atanasoff–Berry Computer, IBM 604, UNIVAC 60, UNIVAC 120
Second Generation (Vacuum Tubes)	Programmable Devices	Colossus, ENIAC, Manchester Small-Scale Experimental Machine, EDSAC, Manchester Mark 1, Ferranti Pegasus, Ferranti Mercury, CSIRAC, EDVAC, UNIVAC I, IBM 701, IBM 702, IBM 650, Z22
Third Generation (Discrete transistors and SSI, MSI, LSI Integrated circuits)	Mainframes	IBM 7090, IBM 7080, IBM System/360, BUNCH
	Minicomputer	PDP-8, PDP-11, IBM System/32, IBM System/36
Fourth Generation (VLSI integrated circuits)	Minicomputer	VAX, IBM System i
	4-bit microcomputer	Intel 4004, Intel 4040
	8-bit microcomputer	Intel 8008, Intel 8080, Motorola 6800, Motorola 6809, MOS Technology 6502, Zilog Z80
	16-bit microcomputer	Intel 8088, Zilog Z8000, WDC 65816/65802
	32-bit microcomputer	Intel 80386, Pentium, Motorola 68000, ARM architecture
	64-bit microcomputer^[34]	Alpha, MIPS, PA-RISC, PowerPC, SPARC, x86-64
	Embedded computer	Intel 8048, Intel 8051
	Personal computer	Desktop computer, Home computer, Laptop computer, Personal digital assistant (PDA), Portable computer, Tablet PC, Wearable computer
Theoretical/experimental	Quantum computer, Chemical computer, DNA computing, Optical computer, Spintronics based computer

Other Hardware Topics
Peripheral device (Input/output)	Input	Mouse, Keyboard, Joystick, Image scanner, Webcam, Graphics tablet, Microphone
	Output	Monitor, Printer, Loudspeaker
	Both	Floppy disk drive, Hard disk drive, Optical disc drive, Teleprinter
Computer busses	Short range	RS-232, SCSI, PCI, USB
Computer busses	Long range (Computer networking)	Ethernet, ATM, FDDI

Software

Main article: Computer software

Software refers to parts of the computer which do not have a material form, such as programs, data, protocols, etc. When software is stored in hardware that cannot easily be modified (such as BIOS ROM in an IBM PC compatible), it is sometimes called "firmware" to indicate that it falls into an uncertain area somewhere between hardware and software.

Computer software
Operating system	Unix and BSD	UNIX System V, IBM AIX, HP-UX, Solaris (SunOS), IRIX, List of BSD operating systems
	GNU/Linux	List of Linux distributions, Comparison of Linux distributions
	Microsoft Windows	Windows 95, Windows 98, Windows NT, Windows 2000, Windows XP, Windows Vista, Windows 7, Windows CE
	DOS	86-DOS (QDOS), PC-DOS, MS-DOS, FreeDOS
	Mac OS	Mac OS classic, Mac OS X
	Embedded and real-time	List of embedded operating systems
	Experimental	Amoeba, Oberon/Bluebottle, Plan 9 from Bell Labs
Library	Multimedia	DirectX, OpenGL, OpenAL
Library	Programming library	C standard library, Standard Template Library
Data	Protocol	TCP/IP, Kermit, FTP, HTTP, SMTP
Data	File format	HTML, XML, JPEG, MPEG, PNG
User interface	Graphical user interface (WIMP)	Microsoft Windows, GNOME, KDE, QNX Photon, CDE, GEM
User interface	Text-based user interface	Command-line interface, Text user interface
Application	Office suite	Word processing, Desktop publishing, Presentation program, Database management system, Scheduling & Time management, Spreadsheet, Accounting software
	Internet Access	Browser, E-mail client, Web server, Mail transfer agent, Instant messaging
	Design and manufacturing	Computer-aided design, Computer-aided manufacturing, Plant management, Robotic manufacturing, Supply chain management
	Graphics	Raster graphics editor, Vector graphics editor, 3D modeler, Animation editor, 3D computer graphics, Video editing, Image processing
	Audio	Digital audio editor, Audio playback, Mixing, Audio synthesis, Computer music
	Software engineering	Compiler, Assembler, Interpreter, Debugger, Text editor, Integrated development environment, Software performance analysis, Revision control, Software configuration management
	Educational	Edutainment, Educational game, Serious game, Flight simulator
	Games	Strategy, Arcade, Puzzle, Simulation, First-person shooter, Platform, Massively multiplayer, Interactive fiction
	Misc	Artificial intelligence, Antivirus software, Malware scanner, Installer/Package management systems, File manager

Programming languages

Programming languages provide various ways of specifying programs for computers to run. Unlike natural languages, programming languages are designed to permit no ambiguity and to be concise. They are purely written languages and are often difficult to read aloud. They are generally either translated into machine code by a compiler or an assembler before being run, or translated directly at run time by an interpreter. Sometimes programs are executed by a hybrid method of the two techniques. There are thousands of different programming languages—some intended to be general purpose, others useful only for highly specialized applications.

Programming languages
Lists of programming languages	Timeline of programming languages, List of programming languages by category, Generational list of programming languages, List of programming languages, Non-English-based programming languages
Commonly used Assembly languages	ARM, MIPS, x86
Commonly used high-level programming languages	Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal
Commonly used Scripting languages	Bourne script, JavaScript, Python, Ruby, PHP, Perl

Professions and organizations

As the use of computers has spread throughout society, there are an increasing number of careers involving computers.

Computer-related professions
Hardware-related	Electrical engineering, Electronic engineering, Computer engineering, Telecommunications engineering, Optical engineering, Nanoengineering
Software-related	Computer science, Desktop publishing, Human–computer interaction, Information technology, Computational science, Software engineering, Video game industry, Web design

The need for computers to work well together and to be able to exchange information has spawned the need for many standards organizations, clubs and societies of both a formal and informal nature.

Organizations
Standards groups	ANSI, IEC, IEEE, IETF, ISO, W3C
Professional Societies	ACM, ACM Special Interest Groups, IET, IFIP, BCS
Free/Open source software groups	Free Software Foundation, Mozilla Foundation, Apache Software Foundation

Input/output (I/O)

Main article: Input/output

Hard disk drives are common I/O devices used with computers.

I/O is the means by which a computer exchanges information with the outside world. Devices that provide input or output to the computer are called peripherals. On a typical personal computer, peripherals include input devices like the keyboard and mouse, and output devices such as the display and printer. Hard disk drives, floppy disk drives and optical disc drives serve as both input and output devices. Computer networking is another form of I/O.

Often, I/O devices are complex computers in their own right with their own CPU and memory. A graphics processing unit might contain fifty or more tiny computers that perform the calculations necessary to display 3D graphics^{[citation needed]}. Modern desktop computers contain many smaller computers that assist the main CPU in performing I/O.

Multitasking

Main article: Computer multitasking

While a computer may be viewed as running one gigantic program stored in its main memory, in some systems it is necessary to give the appearance of running several programs simultaneously. This is achieved by multitasking i.e. having the computer switch rapidly between running each program in turn.

One means by which this is done is with a special signal called an interrupt which can periodically cause the computer to stop executing instructions where it was and do something else instead. By remembering where it was executing prior to the interrupt, the computer can return to that task later. If several programs are running "at the same time", then the interrupt generator might be causing several hundred interrupts per second, causing a program switch each time. Since modern computers typically execute instructions several orders of magnitude faster than human perception, it may appear that many programs are running at the same time even though only one is ever executing in any given instant. This method of multitasking is sometimes termed "time-sharing" since each program is allocated a "slice" of time in turn.

Before the era of cheap computers, the principle use for multitasking was to allow many people to share the same computer.

Seemingly, multitasking would cause a computer that is switching between several programs to run more slowly — in direct proportion to the number of programs it is running. However, most programs spend much of their time waiting for slow input/output devices to complete their tasks. If a program is waiting for the user to click on the mouse or press a key on the keyboard, then it will not take a "time slice" until the event it is waiting for has occurred. This frees up time for other programs to execute so that many programs may be run at the same time without unacceptable speed loss.

Multiprocessing

Main article: Multiprocessing

Cray designed many supercomputers that used multiprocessing heavily.

Some computers are designed to distribute their work across several CPUs in a multiprocessing configuration, a technique once employed only in large and powerful machines such as supercomputers, mainframe computers and servers. Multiprocessor and multi-core (multiple CPUs on a single integrated circuit) personal and laptop computers are now widely available, and are being increasingly used in lower-end markets as a result.

Supercomputers in particular often have highly unique architectures that differ significantly from the basic stored-program architecture and from general purpose computers.^[31] They often feature thousands of CPUs, customized high-speed interconnects, and specialized computing hardware. Such designs tend to be useful only for specialized tasks due to the large scale of program organization required to successfully utilize most of the available resources at once. Supercomputers usually see usage in large-scale simulation, graphics rendering, and cryptography applications, as well as with other so-called "embarrassingly parallel" tasks.

Networking and the Internet

Main articles: Computer networking and Internet

Visualization of a portion of the routes on the Internet.

Computers have been used to coordinate information between multiple locations since the 1950s. The U.S. military's SAGE system was the first large-scale example of such a system, which led to a number of special-purpose commercial systems like Sabre.

In the 1970s, computer engineers at research institutions throughout the United States began to link their computers together using telecommunications technology. This effort was funded by ARPA (now DARPA), and the computer network that it produced was called the ARPANET.The technologies that made the Arpanet possible spread and evolved.

In time, the network spread beyond academic and military institutions and became known as the Internet. The emergence of networking involved a redefinition of the nature and boundaries of the computer. Computer operating systems and applications were modified to include the ability to define and access the resources of other computers on the network, such as peripheral devices, stored information, and the like, as extensions of the resources of an individual computer. Initially these facilities were available primarily to people working in high-tech environments, but in the 1990s the spread of applications like e-mail and the World Wide Web, combined with the development of cheap, fast networking technologies like Ethernet and ADSL saw computer networking become almost ubiquitous. In fact, the number of computers that are networked is growing phenomenally. A very large proportion of personal computers regularly connect to the Internet to communicate and receive information. "Wireless" networking, often utilizing mobile phone networks, has meant networking is becoming increasingly ubiquitous even in mobile computing environments.

Control unit

Main articles: CPU design and Control unit

Diagram showing how a particular MIPS architecture instruction would be decoded by the control system.

The control unit (often called a control system or central controller) manages the computer's various components; it reads and interprets (decodes) the program instructions, transforming them into a series of control signals which activate other parts of the computer.^[22] Control systems in advanced computers may change the order of some instructions so as to improve performance.

A key component common to all CPUs is the program counter, a special memory cell (a register) that keeps track of which location in memory the next instruction is to be read from.^[23]

The control system's function is as follows—note that this is a simplified description, and some of these steps may be performed concurrently or in a different order depending on the type of CPU:

Read the code for the next instruction from the cell indicated by the program counter.
Decode the numerical code for the instruction into a set of commands or signals for each of the other systems.
Increment the program counter so it points to the next instruction.
Read whatever data the instruction requires from cells in memory (or perhaps from an input device). The location of this required data is typically stored within the instruction code.
Provide the necessary data to an ALU or register.
If the instruction requires an ALU or specialized hardware to complete, instruct the hardware to perform the requested operation.
Write the result from the ALU back to a memory location or to a register or perhaps an output device.
Jump back to step (1).

Since the program counter is (conceptually) just another set of memory cells, it can be changed by calculations done in the ALU. Adding 100 to the program counter would cause the next instruction to be read from a place 100 locations further down the program. Instructions that modify the program counter are often known as "jumps" and allow for loops (instructions that are repeated by the computer) and often conditional instruction execution (both examples of control flow).

It is noticeable that the sequence of operations that the control unit goes through to process an instruction is in itself like a short computer program—and indeed, in some more complex CPU designs, there is another yet smaller computer called a microsequencer that runs a microcode program that causes all of these events to happen.

Arithmetic/logic unit (ALU)

Main article: Arithmetic logic unit

The ALU is capable of performing two classes of operations: arithmetic and logic.

The set of arithmetic operations that a particular ALU supports may be limited to adding and subtracting or might include multiplying or dividing, trigonometry functions (sine, cosine, etc) and square roots. Some can only operate on whole numbers (integers) whilst others use floating point to represent real numbers—albeit with limited precision. However, any computer that is capable of performing just the simplest operations can be programmed to break down the more complex operations into simple steps that it can perform. Therefore, any computer can be programmed to perform any arithmetic operation—although it will take more time to do so if its ALU does not directly support the operation. An ALU may also compare numbers and return boolean truth values (true or false) depending on whether one is equal to, greater than or less than the other ("is 64 greater than 65?").

Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful both for creating complicated conditional statements and processing boolean logic.

Superscalar computers may contain multiple ALUs so that they can process several instructions at the same time.Graphics processors and computers with SIMD and MIMD features often provide ALUs that can perform arithmetic on vectors and matrices.

Memory

Main article: Computer data storage

Magnetic core memory was the computer memory of choice throughout the 1960s, until it was replaced by semiconductor memory.

A computer's memory can be viewed as a list of cells into which numbers can be placed or read. Each cell has a numbered "address" and can store a single number. The computer can be instructed to "put the number 123 into the cell numbered 1357" or to "add the number that is in cell 1357 to the number that is in cell 2468 and put the answer into cell 1595". The information stored in memory may represent practically anything. Letters, numbers, even computer instructions can be placed into memory with equal ease. Since the CPU does not differentiate between different types of information, it is the software's responsibility to give significance to what the memory sees as nothing but a series of numbers.

In almost all modern computers, each memory cell is set up to store binary numbers in groups of eight bits (called a byte). Each byte is able to represent 256 different numbers (2^8 = 256); either from 0 to 255 or -128 to +127. To store larger numbers, several consecutive bytes may be used (typically, two, four or eight). When negative numbers are required, they are usually stored in two's complement notation. Other arrangements are possible, but are usually not seen outside of specialized applications or historical contexts. A computer can store any kind of information in memory if it can be represented numerically. Modern computers have billions or even trillions of bytes of memory.

The CPU contains a special set of memory cells called registers that can be read and written to much more rapidly than the main memory area. There are typically between two and one hundred registers depending on the type of CPU. Registers are used for the most frequently needed data items to avoid having to access main memory every time data is needed. As data is constantly being worked on, reducing the need to access main memory (which is often slow compared to the ALU and control units) greatly increases the computer's speed.

Computer main memory comes in two principal varieties: random-access memory or RAM and read-only memory or ROM. RAM can be read and written to anytime the CPU commands it, but ROM is pre-loaded with data and software that never changes, so the CPU can only read from it. ROM is typically used to store the computer's initial start-up instructions. In general, the contents of RAM are erased when the power to the computer is turned off, but ROM retains its data indefinitely. In a PC, the ROM contains a specialized program called the BIOS that orchestrates loading the computer's operating system from the hard disk drive into RAM whenever the computer is turned on or reset. In embedded computers, which frequently do not have disk drives, all of the required software may be stored in ROM. Software stored in ROM is often called firmware, because it is notionally more like hardware than software. Flash memory blurs the distinction between ROM and RAM, as it retains its data when turned off but is also rewritable. It is typically much slower than conventional ROM and RAM however, so its use is restricted to applications where high speed is unnecessary.

In more sophisticated computers there may be one or more RAM cache memories which are slower than registers but faster than main memory. Generally computers with this sort of cache are designed to move frequently needed data into the cache automatically, often without the need for any intervention on the programmer's part.

Programs

A 1970s punched card containing one line from a FORTRAN program. The card reads: "Z(1) = Y + W(1)" and is labelled "PROJ039" for identification purposes.

In practical terms, a computer program may run from just a few instructions to many millions of instructions, as in a program for a word processor or a web browser. A typical modern computer can execute billions of instructions per second (gigahertz or GHz) and rarely make a mistake over many years of operation. Large computer programs consisting of several million instructions may take teams of programmers years to write, and due to the complexity of the task almost certainly contain errors.

Errors in computer programs are called "bugs". Bugs may be benign and not affect the usefulness of the program, or have only subtle effects. But in some cases they may cause the program to "hang"—become unresponsive to input such as mouse clicks or keystrokes, or to completely fail or "crash". Otherwise benign bugs may sometimes may be harnessed for malicious intent by an unscrupulous user writing an "exploit"—code designed to take advantage of a bug and disrupt a program's proper execution. Bugs are usually not the fault of the computer. Since computers merely execute the instructions they are given, bugs are nearly always the result of programmer error or an oversight made in the program's design.

In most computers, individual instructions are stored as machine code with each instruction being given a unique number (its operation code or opcode for short). The command to add two numbers together would have one opcode, the command to multiply them would have a different opcode and so on. The simplest computers are able to perform any of a handful of different instructions; the more complex computers have several hundred to choose from—each with a unique numerical code. Since the computer's memory is able to store numbers, it can also store the instruction codes. This leads to the important fact that entire programs (which are just lists of instructions) can be represented as lists of numbers and can themselves be manipulated inside the computer just as if they were numeric data. The fundamental concept of storing programs in the computer's memory alongside the data they operate on is the crux of the von Neumann, or stored program, architecture. In some cases, a computer might store some or all of its program in memory that is kept separate from the data it operates on. This is called the Harvard architecture after the Harvard Mark I computer. Modern von Neumann computers display some traits of the Harvard architecture in their designs, such as in CPU caches.

While it is possible to write computer programs as long lists of numbers (machine language) and this technique was used with many early computers, it is extremely tedious to do so in practice, especially for complicated programs. Instead, each basic instruction can be given a short name that is indicative of its function and easy to remember—a mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are collectively known as a computer's assembly language. Converting programs written in assembly language into something the computer can actually understand (machine language) is usually done by a computer program called an assembler. Machine languages and the assembly languages that represent them (collectively termed low-level programming languages) tend to be unique to a particular type of computer. For instance, an ARM architecture computer (such as may be found in a PDA or a hand-held videogame) cannot understand the machine language of an Intel Pentium or the AMD Athlon 64 computer that might be in a PC.

Though considerably easier than in machine language, writing long programs in assembly language is often difficult and error prone. Therefore, most complicated programs are written in more abstract high-level programming languages that are able to express the needs of the programmer more conveniently (and thereby help reduce programmer error). High level languages are usually "compiled" into machine language (or sometimes into assembly language and then into machine language) using another computer program called a compiler. Since high level languages are more abstract than assembly language, it is possible to use different compilers to translate the same high level language program into the machine language of many different types of computer. This is part of the means by which software like video games may be made available for different computer architectures such as personal computers and various video game consoles.

The task of developing large software systems presents a significant intellectual challenge. Producing software with an acceptably high reliability within a predictable schedule and budget has historically been difficult; the academic and professional discipline of software engineering concentrates specifically on this challenge.

Example

A traffic light showing red

Suppose a computer is being employed to operate a traffic light at an intersection between two streets. The computer has the following three basic instructions.

ON(Streetname, Color) Turns the light on Streetname with a specified Color on.
OFF(Streetname, Color) Turns the light on Streetname with a specified Color off.
WAIT(Seconds) Waits a specifed number of seconds.
START Starts the program
REPEAT Tells the computer to repeat a specified part of the program in a loop.

Comments are marked with a // on the left margin. Comments in a computer program do not affect the operation of the program. They are not evaluated by the computer. Assume the streetnames are Broadway and Main.

START

//Let Broadway traffic go
OFF(Broadway, Red)
ON(Broadway, Green)
WAIT(60 seconds)

//Stop Broadway traffic
OFF(Broadway, Green)
ON(Broadway, Yellow)
WAIT(3 seconds)
OFF(Broadway, Yellow)
ON(Broadway, Red)

//Let Main traffic go
OFF(Main, Red)
ON(Main, Green)
WAIT(60 seconds)

//Stop Main traffic
OFF(Main, Green)
ON(Main, Yellow)
WAIT(3 seconds)
OFF(Main, Yellow)
ON(Main, Red)

//Tell computer to continuously repeat the program.
REPEAT ALL

With this set of instructions, the computer would cycle the light continually through red, green, yellow and back to red again on both streets.

However, suppose there is a simple on/off switch connected to the computer that is intended to be used to make the light flash red while some maintenance operation is being performed. The program might then instruct the computer to:

START

IF Switch == OFF then: //Normal traffic signal operation
{
//Let Broadway traffic go
OFF(Broadway, Red)
ON(Broadway, Green)
WAIT(60 seconds)

//Stop Broadway traffic
OFF(Broadway, Green)
ON(Broadway, Yellow)
WAIT(3 seconds)
OFF(Broadway, Yellow)
ON(Broadway, Red)

//Let Main traffic go
OFF(Main, Red)
ON(Main, Green)
WAIT(60 seconds)

//Stop Main traffic
OFF(Main, Green)
ON(Main, Yellow)
WAIT(3 seconds)
OFF(Main, Yellow)
ON(Main, Red)

//Tell the computer to repeat this section continuously.
REPEAT THIS SECTION
}

IF Switch == ON THEN: //Maintenance Mode
{
//Turn the red lights on and wait 1 second.
ON(Broadway, Red)
ON(Main, Red)
WAIT(1 second)

//Turn the red lights off and wait 1 second.
OFF(Broadway, Red)
OFF(Main, Red)
WAIT(1 second)

//Tell the computer to repeat the statements in this section.
REPEAT THIS SECTION
}

In this manner, the traffic signal will run a flash-red program when the switch is on, and will run the normal program when the switch is off. Both of these program examples show the basic layout of a computer program in a simple, familiar context of a traffic signal. Any experienced programmer can spot many software bugs in the program, for instance, not making sure that the green light is off when the switch is set to flash red. However, to remove all possible bugs would make this program much longer and more complicated, and would be confusing to nontechnical readers: the aim of this example is a simple demonstration of how computer instructions are laid out.

Function

Main articles: Central processing unit and Microprocessor

A general purpose computer has four main components: the arithmetic logic unit (ALU), the control unit, the memory, and the input and output devices (collectively termed I/O). These parts are interconnected by busses, often made of groups of wires.

Inside each of these parts are thousands to trillions of small electrical circuits which can be turned off or on by means of an electronic switch. Each circuit represents a bit (binary digit) of information so that when the circuit is on it represents a "1", and when off it represents a "0" (in positive logic representation). The circuits are arranged in logic gates so that one or more of the circuits may control the state of one or more of the other circuits.

The control unit, ALU, registers, and basic I/O (and often other hardware closely linked with these) are collectively known as a central processing unit (CPU). Early CPUs were composed of many separate components but since the mid-1970s CPUs have typically been constructed on a single integrated circuit called a microprocessor.

A computer is a machine

The Columbia Supercomputer, located at the NASA Ames Research Center.

A computer is a machine that manipulates data according to a set of instructions.

Although mechanical examples of computers have existed through much of recorded human history, the first electronic computers were developed in the mid-20th century (1940–1945). These were the size of a large room, consuming as much power as several hundred modern personal computers (PCs). Modern computers based on integrated circuits are millions to billions of times more capable than the early machines, and occupy a fraction of the space. Simple computers are small enough to fit into a wristwatch, and can be powered by a watch battery. Personal computers in their various forms are icons of the Information Age and are what most people think of as "computers". The embedded computers found in many devices from MP3 players to fighter aircraft and from toys to industrial robots are however the most numerous.

The ability to store and execute lists of instructions called programs makes computers extremely versatile, distinguishing them from calculators. The Church–Turing thesis is a mathematical statement of this versatility: any computer with a certain minimum capability is, in principle, capable of performing the same tasks that any other computer can perform. Therefore computers ranging from a mobile phone to a supercomputer are all able to perform the same computational tasks, given enough time and storage capacity.

History of computing

Main article: History of computing hardware

The Jacquard loom, on display at the Museum of Science and Industry in Manchester, England, was one of the first programmable devices.

The first use of the word "computer" was recorded in 1613, referring to a person who carried out calculations, or computations, and the word continued to be used in that sense until the middle of the 20th century. From the end of the 19th century onwards though, the word began to take on its more familiar meaning, describing a machine that carries out computations.

The history of the modern computer begins with two separate technologies—automated calculation and programmability—but no single device can be identified as the earliest computer, partly because of the inconsistent application of that term. Examples of early mechanical calculating devices include the abacus, the slide rule and arguably the astrolabe and the Antikythera mechanism (which dates from about 150–100 BC). Hero of Alexandria (c. 10–70 AD) built a mechanical theater which performed a play lasting 10 minutes and was operated by a complex system of ropes and drums that might be considered to be a means of deciding which parts of the mechanism performed which actions and when. This is the essence of programmability.

The "castle clock", an astronomical clock invented by Al-Jazari in 1206, is considered to be the earliest programmable analog computer. It displayed the zodiac, the solar and lunar orbits, a crescent moon-shaped pointer travelling across a gateway causing automatic doors to open every hour and five robotic musicians who played music when struck by levers operated by a camshaft attached to a water wheel. The length of day and night could be re-programmed to compensate for the changing lengths of day and night throughout the year.

The Renaissance saw a re-invigoration of European mathematics and engineering. Wilhelm Schickard's 1623 device was the first of a number of mechanical calculators constructed by European engineers, but none fit the modern definition of a computer, because they could not be programmed.

In 1801, Joseph Marie Jacquard made an improvement to the textile loom by introducing a series of punched paper cards as a template which allowed his loom to weave intricate patterns automatically. The resulting Jacquard loom was an important step in the development of computers because the use of punched cards to define woven patterns can be viewed as an early, albeit limited, form of programmability.

It was the fusion of automatic calculation with programmability that produced the first recognizable computers. In 1837, Charles Babbage was the first to conceptualize and design a fully programmable mechanical computer, his analytical engine. Limited finances and Babbage's inability to resist tinkering with the design meant that the device was never completed.

In the late 1880s, Herman Hollerith invented the recording of data on a machine readable medium. Prior uses of machine readable media, above, had been for control, not data. "After some initial trials with paper tape, he settled on punched cards ..." To process these punched cards he invented the tabulator, and the keypunch machines. These three inventions were the foundation of the modern information processing industry. Large-scale automated data processing of punched cards was performed for the 1890 United States Census by Hollerith's company, which later became the core of IBM. By the end of the 19th century a number of technologies that would later prove useful in the realization of practical computers had begun to appear: the punched card, Boolean algebra, the vacuum tube (thermionic valve) and the teleprinter.

During the first half of the 20th century, many scientific computing needs were met by increasingly sophisticated analog computers, which used a direct mechanical or electrical model of the problem as a basis for computation. However, these were not programmable and generally lacked the versatility and accuracy of modern digital computers.

Alan Turing is widely regarded to be the father of modern computer science. In 1936 Turing provided an influential formalisation of the concept of the algorithm and computation with the Turing machine. Of his role in the modern computer, Time magazine in naming Turing one of the 100 most influential people of the 20th century, states: "The fact remains that everyone who taps at a keyboard, opening a spreadsheet or a word-processing program, is working on an incarnation of a Turing machine".

The inventor of the program-controlled computer was Konrad Zuse, who built the first working computer in 1941 and later in 1955 the first computer based on magnetic storage

George Stibitz is internationally recognized as a father of the modern digital computer. While working at Bell Labs in November 1937, Stibitz invented and built a relay-based calculator he dubbed the "Model K" (for "kitchen table", on which he had assembled it), which was the first to use binary circuits to perform an arithmetic operation. Later models added greater sophistication including complex arithmetic and programmability.

Defining characteristics of some early digital computers of the 1940s (In the history of computing hardware)
Name	First operational	Numeral system	Computing mechanism	Programming	Turing complete
Zuse Z3 (Germany)	May 1941	Binary	Electro-mechanical	Program-controlled by punched film stock (but no conditional branch)	Yes (1998)
Atanasoff–Berry Computer (US)	1942	Binary	Electronic	Not programmable—single purpose	No
Colossus Mark 1 (UK)	February 1944	Binary	Electronic	Program-controlled by patch cables and switches	No
Harvard Mark I – IBM ASCC (US)	May 1944	Decimal	Electro-mechanical	Program-controlled by 24-channel punched paper tape (but no conditional branch)	No
Colossus Mark 2 (UK)	June 1944	Binary	Electronic	Program-controlled by patch cables and switches	No
ENIAC (US)	July 1946	Decimal	Electronic	Program-controlled by patch cables and switches	Yes
Manchester Small-Scale Experimental Machine (UK)	June 1948	Binary	Electronic	Stored-program in Williams cathode ray tube memory	Yes
Modified ENIAC (US)	September 1948	Decimal	Electronic	Program-controlled by patch cables and switches plus a primitive read-only stored programming mechanism using the Function Tables as program ROM	Yes
EDSAC (UK)	May 1949	Binary	Electronic	Stored-program in mercury delay line memory	Yes
Manchester Mark 1 (UK)	October 1949	Binary	Electronic	Stored-program in Williams cathode ray tube memory and magnetic drum memory	Yes
CSIRAC (Australia)	November 1949	Binary	Electronic	Stored-program in mercury delay line memory	Yes

A succession of steadily more powerful and flexible computing devices were constructed in the 1930s and 1940s, gradually adding the key features that are seen in modern computers. The use of digital electronics (largely invented by Claude Shannon in 1937) and more flexible programmability were vitally important steps, but defining one point along this road as "the first digital electronic computer" is difficult.^{Shannon 1940} Notable achievements include:

EDSAC was one of the first computers to implement the stored program (von Neumann) architecture.

Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75 mm) in its packaging.

Konrad Zuse's electromechanical "Z machines". The Z3 (1941) was the first working machine featuring binary arithmetic, including floating point arithmetic and a measure of programmability. In 1998 the Z3 was proved to be Turing complete, therefore being the world's first operational computer.
The non-programmable Atanasoff–Berry Computer (1941) which used vacuum tube based computation, binary numbers, and regenerative capacitor memory. The use of regenerative memory allowed it to be much more compact then its peers (being approximately the size of a large desk or workbench), since intermediate results could be stored and then fed back into the same set of computation elements.
The secret British Colossus computers (1943), which had limited programmability but demonstrated that a device using thousands of tubes could be reasonably reliable and electronically reprogrammable. It was used for breaking German wartime codes.
The Harvard Mark I (1944), a large-scale electromechanical computer with limited programmability.
The U.S. Army's Ballistic Research Laboratory ENIAC (1946), which used decimal arithmetic and is sometimes called the first general purpose electronic computer (since Konrad Zuse's Z3 of 1941 used electromagnets instead of electronics). Initially, however, ENIAC had an inflexible architecture which essentially required rewiring to change its programming.

Several developers of ENIAC, recognizing its flaws, came up with a far more flexible and elegant design, which came to be known as the "stored program architecture" or von Neumann architecture. This design was first formally described by John von Neumann in the paper First Draft of a Report on the EDVAC, distributed in 1945. A number of projects to develop computers based on the stored-program architecture commenced around this time, the first of these being completed in Great Britain. The first to be demonstrated working was the Manchester Small-Scale Experimental Machine (SSEM or "Baby"), while the EDSAC, completed a year after SSEM, was the first practical implementation of the stored program design. Shortly thereafter, the machine originally described by von Neumann's paper—EDVAC—was completed but did not see full-time use for an additional two years.

Nearly all modern computers implement some form of the stored-program architecture, making it the single trait by which the word "computer" is now defined. While the technologies used in computers have changed dramatically since the first electronic, general-purpose computers of the 1940s, most still use the von Neumann architecture.

Computers using vacuum tubes as their electronic elements were in use throughout the 1950s, but by the 1960s had been largely replaced by transistor-based machines, which were smaller, faster, cheaper to produce, required less power, and were more reliable. The first transistorised computer was demonstrated at the University of Manchester in 1953. In the 1970s, integrated circuit technology and the subsequent creation of microprocessors, such as the Intel 4004, further decreased size and cost and further increased speed and reliability of computers. By the late 1970s, many products such as video recorders contained dedicated computers called microcontrollers, and they started to appear as a replacement to mechanical controls in domestic appliances such as washing machines. The 1980s witnessed home computers and the now ubiquitous personal computer. With the evolution of the Internet, personal computers are becoming as common as the television and the telephone in the household.

Modern smartphones are fully-programmable computers in their own right, and as of 2009 may well be the most common form of such computers in existence.

Stored program architecture

Main articles: Computer program and Computer programming

The defining feature of modern computers which distinguishes them from all other machines is that they can be programmed. That is to say that a list of instructions (the program) can be given to the computer and it will store them and carry them out at some time in the future.

In most cases, computer instructions are simple: add one number to another, move some data from one location to another, send a message to some external device, etc. These instructions are read from the computer's memory and are generally carried out (executed) in the order they were given. However, there are usually specialized instructions to tell the computer to jump ahead or backwards to some other place in the program and to carry on executing from there. These are called "jump" instructions (or branches). Furthermore, jump instructions may be made to happen conditionally so that different sequences of instructions may be used depending on the result of some previous calculation or some external event. Many computers directly support subroutines by providing a type of jump that "remembers" the location it jumped from and another instruction to return to the instruction following that jump instruction.

Program execution might be likened to reading a book. While a person will normally read each word and line in sequence, they may at times jump back to an earlier place in the text or skip sections that are not of interest. Similarly, a computer may sometimes go back and repeat the instructions in some section of the program over and over again until some internal condition is met. This is called the flow of control within the program and it is what allows the computer to perform tasks repeatedly without human intervention.

Comparatively, a person using a pocket calculator can perform a basic arithmetic operation such as adding two numbers with just a few button presses. But to add together all of the numbers from 1 to 1,000 would take thousands of button presses and a lot of time—with a near certainty of making a mistake. On the other hand, a computer may be programmed to do this with just a few simple instructions. For example:

mov #0,sum ; set sum to 0
mov #1,num ; set num to 1
loop: add num,sum ; add num to sum
add #1,num ; add 1 to num
cmp num,#1000 ; compare num to 1000
ble loop ; if num <= 1000, go back to 'loop'
halt ; end of program. stop running

Once told to run this program, the computer will perform the repetitive addition task without further human intervention. It will almost never make a mistake and a modern PC can complete the task in about a millionth of a second.

However, computers cannot "think" for themselves in the sense that they only solve problems in exactly the way they are programmed to. An intelligent human faced with the above addition task might soon realize that instead of actually adding up all the numbers one can simply use the equation

$1+2+3+...+n = {{n(n+1)} \over 2}$

and arrive at the correct answer (500,500) with little work. In other words, a computer programmed to add up the numbers one by one as in the example above would do exactly that without regard to efficiency or alternative solutions.

Theory of computation

The theory of computation or computer theory is the branch of computer science and mathematics that deals with whether and how efficiently problems can be solved on a model of computation, using an algorithm. The field is divided into two major branches: computability theory and complexity theory, but both branches deal with formal models of computation.

In order to perform a rigorous study of computation, computer scientists work with a mathematical abstraction of computers called a model of computation. There are several models in use, but the most commonly examined is the Turing machine. A Turing machine can be thought of as a desktop PC with a potentially infinite memory capacity, though it can only access this memory in small discrete chunks. Computer scientists study the Turing machine because it is simple to formulate, can be analyzed and used to prove results, and because it represents what many consider the most powerful possible "reasonable" model of computation. It might seem that the potentially infinite memory capacity is an unrealizable attribute, but any decidable problem solved by a Turing machine will always require only a finite amount of memory. So in principle, any problem that can be solved (decided) by a Turing machine can be solved by a computer that has a bounded amount of memory.

History

Main article: History of theory of computation

The theory of computation can be considered as the creation of models of all kinds in the field of computer science. Therefore mathematics and logic are used. In the last century it became an independent academic discipline and was separated from mathematics.

Some pioneers of the theory of computation were Alonzo Church, Alan Turing, Stephen Kleene, John von Neumann, Claude Shannon, and Noam Chomsky.

Computability theory

Main article: Computability theory

Computability theory deals primarily with the question of the extent to which a problem is solvable on a computer. The statement that the halting problem cannot be solved by a Turing machine is one of the most important results in computability theory, as it is an example of a concrete problem that is both easy to formulate and impossible to solve using a Turing machine. Much of computability theory builds on the halting problem result.

Another important step in computability theory was Rice's theorem, which states that for all non-trivial properties of partial functions, it is undecidable whether a Turing machine computes a partial function with that property.

Computability theory is closely related to the branch of mathematical logic called recursion theory, which removes the restriction of studying only models of computation which are reducible to the Turing model. Many mathematicians and computational theorists who study recursion theory will refer to it as computability theory.

Complexity theory

Main article: Computational complexity theory

Complexity theory considers not only whether a problem can be solved at all on a computer, but also how efficiently the problem can be solved. Two major aspects are considered: time complexity and space complexity, which are respectively how many steps does it take to perform a computation, and how much memory is required to perform that computation.

In order to analyze how much time and space a given algorithm requires, computer scientists express the time or space required to solve the problem as a function of the size of the input problem. For example, finding a particular number in a long list of numbers becomes harder as the list of numbers grows larger. If we say there are $n$ numbers in the list, then if the list is not sorted or indexed in any way we may have to look at every number in order to find the number we're seeking. We thus say that in order to solve this problem, the computer needs to perform a number of steps that grows linearly in the size of the problem.

To simplify this problem, computer scientists have adopted Big O notation, which allows functions to be compared in a way that ensures that particular aspects of a machine's construction do not need to be considered, but rather only the asymptotic behavior as problems become large. So in our previous example we might say that the problem requires $O (n)$ steps to solve.

Perhaps the most important open problem in all of computer science is the question of whether a certain broad class of problems denoted NP can be solved efficiently. This is discussed further at Complexity classes P and NP.

Other formal definitions of computation

Aside from a Turing machine, other equivalent (See: Church–Turing thesis) models of computation are in use.

Lambda calculus: A computation consists of an initial lambda expression (or two if you want to separate the function and its input) plus a finite sequence of lambda terms, each deduced from the preceding term by one application of Beta reduction.
Combinatory logic: is a concept which has many similarities to $λ$ -calculus, but also important differences exist (e.g. fixed point combinator Y has normal form in combinatory logic but not in $λ$ -calculus). Combinatory logic was developed with great ambitions: understanding the nature of paradoxes, making foundations of mathematics more economic (conceptually), eliminating the notion of variables (thus clarifying their role in mathematics).

mu-recursive functions: a computation consists of a mu-recursive function, i.e. its defining sequence, any input value(s) and a sequence of recursive functions appearing in the defining sequence with inputs and outputs. Thus, if in the defining sequence of a recursive function $f (x)$ the functions $g (x)$ and $h (x, y)$ appear, then terms of the form 'g(5)=7' or 'h(3,2)=10' might appear. Each entry in this sequence needs to be an application of a basic function or follow from the entries above by using composition, primitive recursion or mu recursion. For instance if $f (x) = h (x, g (x))$ , then for 'f(5)=3' to appear, terms like 'g(5)=6' and 'h(5,6)=3' must occur above. The computation terminates only if the final term gives the value of the recursive function applied to the inputs.

Markov algorithm: a string rewriting system that uses grammar-like rules to operate on strings of symbols.

Register machine: is a theoretically interesting idealization of a computer. There are several variants. In most of them, each register can hold a natural number (of unlimited size), and the instructions are simple (and few in number), e.g. only decrementation (combined with conditional jump) and incrementation exist (and halting). The lack of the infinite (or dynamically growing) external store (seen at Turing machines) can be understood by replacing its role with Gödel numbering techniques: the fact that each register holds a natural number allows the possibility of representing a complicated thing (e.g. a sequence, or a matrix etc.) by an appropriate huge natural number — unambiguity of both representation and interpretation can be established by number theoretical foundations of these techniques.

P′′: Like Turing machines, P′′ uses an infinite tape of symbols (without random access), and a rather minimalistic set of instructions. But these instructions are very different, thus, unlike Turing machines, P′′ does not need to maintain a distinct state, because all “memory-like” functionality can be provided only by the tape. Instead of rewriting the current symbol, it can perform a modular arithmetic incrementation on it. P′′ has also a pair of instructions for a cycle, inspecting the blank symbol. Despite its minimalistic nature, it has become the parental formal language of an implemented and (for entertainment) used programming language called Brainfuck.

In addition to the general computational models, some simpler computational models are useful for special, restricted applications. Regular expressions, for example, specify string patterns in many contexts, from office productivity software to programming languages. Another formalism mathematically equivalent to regular expressions, Finite automata are used in circuit design and in some kinds of problem-solving. Context-free grammars specify programming language syntax. Non-deterministic pushdown automata are another formalism equivalent to context-free grammars. Primitive recursive functions are a defined subclass of the recursive functions.

Different models of computation have the ability to do different tasks. One way to measure the power of a computational model is to study the class of formal languages that the model can generate; in such a way to the Chomsky hierarchy of languages is obtained.

Theoretical foundation of computing

A wider view of what constitutes a theory of computation could include topics such as formal semantics and mathematical logic and would include the theoretical aspects of topics such as parallel computation, neural networks and quantum computing.

Tuesday, January 19, 2010

History

Profession

Employment

Certification

Impact of globalization

Education

Sub-disciplines

Related disciplines

Computer science

Project management

Systems engineering

Software

Programming languages

Professions and organizations

Multitasking

Multiprocessing

Networking and the Internet

Arithmetic/logic unit (ALU)

Memory

Example

Function

History of computing

Stored program architecture

History

Computability theory

Complexity theory

Other formal definitions of computation

Theoretical foundation of computing

Further reading

Followers

Blog Archive

About Me