Your Search Is Over When You Reach Here...: December 2011

Friday, 2 December 2011

Software Engineering (SE)

Software Engineering (SE) is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software.It is the application of Engineering to software because it integrates significant mathematics, computer science and practices whose origins are in Engineering. It is also defined as a systematic approach to the analysis, design, assessment, implementation, test, maintenance and reengineering of software, that is, the application of engineering to software. The term software engineering first appeared in the 1968 NATO Software Engineering Conference, and was meant to provoke thought regarding the perceived "software crisis" at the time.

Software development, a much used and more generic term, does not necessarily subsume the engineering paradigm. Although it is questionable what impact it has had on actual software development over the last more than 40 years, the field's future looks bright according to Money Magazine and Salary.com, which rated "software engineer" as the best job in the United States in 2006

Visual Basic (VB)

Visual Basic (VB) is the third-generation event-driven programming language and integrated development environment (IDE) from Microsoft for its COM programming model. Visual Basic is designed to be relatively easy to learn and use.[1][2]

Visual Basic was derived from BASIC and enables the rapid application development (RAD) of graphical user interface (GUI) applications, access to databases using Data Access Objects, Remote Data Objects, or ActiveX Data Objects, and creation of ActiveX controls and objects. Scripting languages such as VBA and VBScript are syntactically similar to Visual Basic, but perform differently.[3]

A programmer can put together an application using the components provided with Visual Basic itself. Programs written in Visual Basic can also use the Windows API, but doing so requires external function declarations.

The final release was version 6 in 1998. Microsoft's extended support ended in March 2008 and the designated successor was Visual Basic .NET (now known simply as Visual Basic).

Characteristics Of Visual Basic (VB)

Visual Basic has the following traits which differ from C-derived languages:

Multiple assignment available in C language is not possible. A = B = C does not imply that the values of A, B and C are equal. The boolean result of "Is B = C?" is stored in A. The result stored in A would therefore be either false or true.
Boolean constant True has numeric value −1.[4] This is because the Boolean data type is stored as a 16-bit signed integer. In this construct −1 evaluates to 16 binary 1s (the Boolean value True), and 0 as 16 0s (the Boolean value False). This is apparent when performing a Not operation on a 16 bit signed integer value 0 which will return the integer value −1, in other words True = Not False. This inherent functionality becomes especially useful when performing logical operations on the individual bits of an integer such as And, Or, Xor and Not.[5] This definition of True is also consistent with BASIC since the early 1970s Microsoft BASIC implementation and is also related to the characteristics of CPU instructions at the time.
Logical and bitwise operators are unified. This is unlike some C-derived languages (such as Perl), which have separate logical and bitwise operators. This again is a traditional feature of BASIC.
Variable array base. Arrays are declared by specifying the upper and lower bounds in a way similar to Pascal and Fortran. It is also possible to use the Option Base statement to set the default lower bound. Use of the Option Base statement can lead to confusion when reading Visual Basic code and is best avoided by always explicitly specifying the lower bound of the array. This lower bound is not limited to 0 or 1, because it can also be set by declaration. In this way, both the lower and upper bounds are programmable. In more subscript-limited languages, the lower bound of the array is not variable. This uncommon trait does exist in Visual Basic .NET but not in VBScript.

OPTION BASE was introduced by ANSI, with the standard for ANSI Minimal BASIC in the late 1970s.

Relatively strong integration with the Windows operating system and the Component Object Model. The native types for strings and arrays are the dedicated COM types, BSTR and SAFEARRAY.
Banker's rounding as the default behavior when converting real numbers to integers with the Round function.[6] ? Round(2.5, 0) gives 2, ? Round(3.5, 0) gives 4.
Integers are automatically promoted to reals in expressions involving the normal division operator (/) so that division of one integer by another produces the intuitively correct result. There is a specific integer divide operator (\) which does truncate.
By default, if a variable has not been declared or if no type declaration character is specified, the variable is of type Variant. However this can be changed with Deftype statements such as DefInt, DefBool, DefVar, DefObj, DefStr. There are 12 Deftype statements in total offered by Visual Basic 6.0. The default type may be overridden for a specific declaration by using a special suffix character on the variable name (# for Double, ! for Single, & for Long, % for Integer, $ for String, and @ for Currency) or using the key phrase As (type). VB can also be set in a mode that only explicitly declared variables can be used with the command Option Explicit.

Language features Visual Basic (VB)

Database storage

Main article: Computer data storage

Database storage is the container of the physical materialization of a database. It comprises the Internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the Conceptual level and External level from the Internal level when needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to manage the database that resides in it. Though typically accessed by a DBMS through the underlying Operating system (and often utilizing the operating systems' File systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).

In principle the database storage can be viewed as a linear address space, where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures which define access paths (using pointers) to all needed data in effective manner, optimized for the needed data access operations.

Database security

Main article: Database security

Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program).

The following are major areas of database security (among many others).

Access control
Main article: Access control

Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access information).

Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces.

Data security
Main articles: Data security and Encryption

The definition of data security varies and may overlap with other database security aspects. Broadly it deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see Physical security), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see Data encryption).

Database audit
Main article: Database audit

Database audit primarily involves monitoring that no security breach, in all aspects, has happened. If security breach is discovered then all possible corrective actions are taken.

Database design
Main article: Database design

Database design is done before building it to meet needs of end-users within a given application/information-system that the database is intended to support. The database design defines the needed data and data structures that such a database comprises. A design is typically carried out according to the common three architectural levels of a database (see Database architecture above). First, the conceptual level is designed, which defines the over-all picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well as the relationships among them. On top of it the external level, various views of the database, are designed according to (possibly completely different) needs of specific end-user types. More external views can be added later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and relationships), but usually a well designed conceptual level for an application well supports most of the needed external views. The conceptual view also determines the internal level (which primarily deals with data layout in storage) to a great extent. External views requirement may add supporting storage structures, like materialized views and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average way that takes into account performance requirements (possibly conflicting) of different external views according to their relative importance. While the conceptual and external levels design can usually be done independently of any DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS utilized (see the Implementation section below).

A common way to carry out conceptual level design is to use the Entity-relationship model (ERM) (both the basic one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational model and dependencies (mathematical relationships) among data to normalize the database, i.e., to define the ("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less utilized in practice than the ERM method.

The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies which are induced by keys/identifiers of entities and relationships. Also the ERM inherently includes the important inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in a relationship with other entities) which usually have been ignored in normalization.[13] In addition the ERM allows entity type generalization (the Is-a relationship) and implied property (attribute) inheritance (similarly to the that found in the Object model).

Another aspect of database design is its security. It involves both defining access control to database objects (e.g., Entities, Views) as well as defining security levels and methods for the data itself (See Database security above).

Database languages

Main articles: Data definition language, Data manipulation language, and Query language

Database languages are dedicated programming languages, tailored and utilized to

define a database (i.e., its specific data types and the relationships among them),
manipulate its content (e.g., insert new data occurrences, and update or delete existing ones), and
query it (i.e., request information: compute and retrieve any information based on its data).

Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the data (which typically differs among different data models). They typically have commands to instruct execution of the desired operations in the database. Each such command is equivalent to a complex expression (program) in a regular programming language, and thus programming in dedicated (database) languages simplifies the task of handling databases considerably. An expressions in a database language is automatically transformed (by a compiler or interpreter, as regular programming languages) to a proper computer program that runs while accessing the database and providing the needed results. The following are notable examples:

SQL for the Relational model
Main article: SQL

A major Relational model language supported by all the relational DBMSs and a standard.

SQL was one of the first commercial languages for the relational model. Despite not adhering to the relational model as described by Codd, it has become the most widely used database language.[10][11] Though often described as, and to a great extent is a declarative language, SQL also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as well as vendor lock-in.

OQL for the Object model
Main article: OQL

An Object model language standard (by the Object Data Management Group) that has influenced the design of some of the newer query languages like JDOQL and EJB QL, though they cannot be considered as different flavors of OQL.

XQuery for the XML model
Main articles: XQuery and XML

XQuery is an XML based database language (also named XQL).

Database architecture

Database architecture (to be distinguished from DBMS architecture; see below) may be viewed, to some extent, as an extension of Data modeling. It is used to conveniently answer requirements of different end-users from a same database, as well as for other benefits. For example, a financial department of a company needs the payment details of all employees as part of the company's expenses, but not other many details about employees, that are the interest of the human resources department. Thus different departments need different views of the company's database, that both include the employees' payments, possibly in a different level of detail (and presented in different visual forms). To meet such requirement effectively database architecture consists of three levels: external, conceptual and internal. Clearly separating the three levels was a major feature of the relational database model implementations that dominate 21st century databases.[12]

The external level defines how each end-user type understands the organization of its respective relevant data in the database, i.e., the different needed end-user views. A single database can have any number of views at the external level.
The conceptual level unifies the various external views into a coherent whole, global view.[12] It provides the common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data from which any view may be derived/computed. It is provided in the simplest possible way of such generic data, and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves database application developers and defined by database administrators that build the database.
The Internal level (or Physical level) is as a matter of fact part of the database implementation inside a DBMS (see Implementation section below). It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like indexes, to enhance performance, and occasionally stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its end-uses according to the database goals and priorities.

All the three levels are maintained and updated according to changing needs by database administrators who often also participate in the database design.

The above three-level database architecture also relates to and being motivated by the concept of data independence which has been described for long time as a desired database property and was one of the major initial driving forces of the Relational model. In the context of the above architecture it means that changes made at a certain level do not affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level automatically. For example, changes in the internal level do not affect application programs written using conceptual level interfaces, which saves substantial change work that would be needed otherwise.

In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a common view of the database, independent of different external view structures, and on the other hand it is uncomplicated by details of how the data is stored or managed (internal level). In principle every level, and even every external view, can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and depends on its implementation (see Implementation section below), requires a different level of detail and uses its own data structure types, typically different in nature from the structures of the external and conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation details.

Database type examples

The following are examples of various database types. Some of them are not main-stream types, but most of them have received special attention (e.g., in research) due to end-user requirements. Some exist as specialized DBMS products, and some have their functionality types incorporated in existing general-purpose DBMSs.
Main article: Active database

An active database is a database that includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization.

Most modern relational databases include active database features in the form of database trigger.

Main article: Cloud database

A Cloud database is a database that relies on cloud technology. Both the database and most of its DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and later maintained and utilized by (application's) end-users through a Web browser and Open APIs. More and more such database products are emerging, both of new vendors and by virtually all established database vendors.

Main article: Data warehouse

Data warehouses archive data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs so that it can be compared with ACNielsen data. Some basic and essential components of data warehousing include retrieving, analyzing, and mining data, transforming,loading and managing data so as to make it available for further use.

Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm.

Main article: Distributed database

The definition of a distributed database is broad, and may be utilized in different meanings. In general it typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a single DBMS over processes, computers, and sites, while managing a single database distributed itself over multiple computers, and different sites.

Examples are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include both segments shared by multiple sites, and segments specific to one site and used only locally in that site.

Main article: Document-oriented database

General-purpose DBMS

A DBMS has evolved into a complex software system and its development typically requires thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle, Microsoft SQL server, and IBM DB2, have been in on-going development and enhancement for thirty years or more. General-purpose DBMSs aim to satisfy as many applications as possible, which typically makes them even more complex than special-purpose databases. However, the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances, makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements.

Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software code). Such applications usually justify dedicated development. Typical examples are email systems, though they need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling and managing, and do not need significant portions of a general-purpose DBMS functionality.

Database research

Database research has been an active and diverse area, with many specializations, carried out since the early days of dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products. Database research has taken place at research and development groups of companies (e.g., notably at IBM Research, who contributed technologies and ideas virtually to any DBMS existing today), research institutes, and Academia. Research has been done both through Theory and Prototypes. The interaction between research and database related product development has been very productive to the database area, and many related key concepts and technologies emerged from it. Notable are the Relational and the Entity-relationship models, the Atomic transaction concept and related Concurrency control techniques, Query languages and Query optimization methods, RAID, and more. Research has provided deep insight to virtually all aspects of databases, though not always has been pragmatic, effective (and cannot and should not always be: research is exploratory in nature, and not always leads to accepted or useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related technologies, also among those proposed by research. However, occasionally, not the best and most elegant solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of such research, while real product requirements and challenges triggered database research directions and sub-areas.

The database research area has several notable dedicated academic journals (e.g., ACM Transactions on Database Systems-TODS, Data and Knowledge Engineering-DKE, and more) and annual conferences (e.g., ACM SIGMOD, ACM PODS, VLDB, IEEE ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research community all over the world

Evolution of database and DBMS technology

The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared interactive use rather than daily batch processing.

In the earliest database systems, efficiency was perhaps the primary concern, but it was already recognized that there were other important objectives. One of the key aims was to make the data independent of the logic of application programs, so that the same data could be made available to different applications.

The first generation of database systems were navigational,[2] applications typically accessed data by following pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized by IBM's IMS system, and the Codasyl model (Network model), implemented in a number of products such as IDMS.

The Relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. This was considered necessary to allow the content of the database to evolve without constant rewriting of applications. Relational systems placed heavy demands on processing resources, and it was not until the mid 1980s that computing hardware became powerful enough to allow them to be widely deployed. By the early 1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain dominant today (2011) except in niche areas. The dominant database language is the standard SQL for the Relational model, which has influenced database languages also for other data models.

Because the relational model emphasizes search rather than navigation, it does not make relationships between different entities explicit in the form of pointers, but represents them rather using primary keys and foreign keys. While this is a good basis for a query language, it is less well suited as a modeling language. For this reason a different model, the Entity-relationship model which emerged shortly later (1976), gained popularity for database design.

In the period since the 1970s database technology has kept pace with the increasing resources becoming available from the computing platform: notably the rapid increase in the capacity and speed (and reduction in price) of disk storage, and the increasing capacity of main memory. This has enabled ever larger databases and higher throughputs to be achieved.

The rigidity of the relational model, in which all data is held in tables with a fixed structure of rows and columns, has increasingly been seen as a limitation when handling information that is richer or more varied in structure than the traditional 'ledger-book' data of corporate information systems: for example, document databases, engineering databases, multimedia databases, or databases used in the molecular sciences. Various attempts have been made to address this problem, many of them gathering under banners such as post-relational or NoSQL. Two developments of note are the Object database and the XML database. The vendors of relational databases have fought off competition from these newer models by extending the capabilities of their own products to support a wider variety of data types.

The database concept

The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and maintaining complex information systems (typically with many concurrent end-users, and with a diverse large amount of data). It has evolved together with database management systems which enable the effective handling of databases. Though the terms database and DBMS define different entities, they are inseparable: a database's properties are determined by its supporting DBMS and vice-versa. The Oxford English dictionary cites[citation needed] a 1962 technical report as the first to use the term "data-base." With the progress in technology in the areas of processors, computer memory, computer storage. and computer networks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.

No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability requirements (broadly defined by the requirements below) to qualify as a database. Thus, a database and its supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet them.

Database

A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). This definition is very general, and is independent of the technology used.

The term "database" may be narrowed to specify particular aspects of organized collection of data and may refer to the logical database, to physical database as data content in computer data storage or to many other database sub-definitions.

The term database is correctly applied to the data and their supporting data structures, and not to the database management system (referred to by the acronym DBMS). The database data collection with DBMS is called a database system.

The term database system implies that the data is managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose database management system (DBMS).[1] A general-purpose DBMS is typically a complex software system that meets many usage requirements, and the databases that it maintains are often large and complex. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.

Well known DBMSs include Oracle, IBM DB2, Microsoft SQL Server, MS Access, PostgreSQL, MySQL, Lotus Domino, and SQLite. A database is not generally portable across different DBMS, but different DBMSs can inter-operate to some degree by using standards like SQL and ODBC to support together a single application. A DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of performance, availability, and security) as many end-users as needed.

The design, construction, and maintenance of a complex database requires specialist skills: the staff performing these functions are referred to as database application programmers and database administrators. Their tasks are supported by tools provided either as part of the DBMS or as stand-alone software products. These tools include specialized database languages including data definition languages (DDL), data manipulation languages (DML), and query languages. These can be seen as special-purpose programming languages, tailored specifically to manipulate databases; sometimes they are provided as extensions of existing programming languages, with added database commands. Database languages are generally specific to one data model, and in many cases they are specific to one DBMS type. The most widely supported database language is SQL, which has been developed for the relational data model and combines the roles of both DDL, DML, and a query language.

A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance.