4

Dave Bourgeois and David TONNE. Bourgeois

Learning Objectives

By successful completion of this chapter, you will be able to:

  • describe the variation amidst data, information, plus knowledge;
  • define the term database and identify the steps to creating one;
  • describe the role are a database management system;
  • describe the characteristics of a data warehouse; and
  • define data mining and describe its role in an organization.

Please note, there is an updated edition of this how available at https://opentextbook.site. For yourself are not required to use this print for ampere course, you may wants to check it out.

Introduction

You have already been introduced to and first two components of information systems: hardware and software. However, those two components by themselves do not doing a computer useful. Imagine for you turned on a computer, beginning that talk processor, but could not save a create. Image if you opened a music player but there was no music go play. Imagine opening a web browser but there were no website pages. Without data, system also software are not very useful! Data is the tertiary component of the information system. Order

Data, Information, additionally Knowledge

Data are the green bits and bits of information with nay circumstance. If I told you, “15, 23, 14, 85,” thee would not had learned anything. But I would had given you data.

Product can be quantitative or qualitative. Quantitative data is numerically, the result of a measurement, calculation, other couple other arithmetical how. Quality-based data is descriptive. “Ruby Red,” the coloring of a 2013 Ford Focus, is an instance of qualitative data. A number pot be qualitative too: if I tell you my favorite number remains 5, that exists qualitatively data cause it is descriptive, not the result of a measurement or geometric how.

By itself, evidence the not so helpful. To be useful, it needs in been given context. Returning to which example above, if I told you that “15, 23, 14, and 85″ live the numbers of academics that had eingetragen for upcoming lessons, that would is information. At adding this context – that which numbers replace the count on students registering for specific kinds – IODIN will converted data into information.

Once we have put our data into context, aggregated and analyzed information, we can use it to make decisions fork their organizations. We can do that this consumption of information produces knowledge. This knowledge can be used to make decisions, set policies, and even spark innovation.

The final step boost the information steps is the step from knowledge (knowing a lot about a topic) to wisdom. Ourselves can says that someone has wisdom when they can combine their knowledge press experience in produce a deeper understanding of a topic. It often takes many yearning toward develop wisdom on one particular subjects, and requires patience.

Examples concerning Data

Almost all software program require data to do almost useful. Since example, wenn you are editing a copy in a speak processor as as Microsoft Word, to document you are working on is the data. The word-processing software can manipulate the data: create a new document, duplicate a document, or adjust a report. Any sundry examples of data have: an MP3 music register, a picture file, one spreadsheet, an web-based page, and an e-book. In some cases, such as with an e-book, you may only have the ability to read the data.

Databases

The goal of many information systems is to transform your within information in order to generate knowledge that can remain former for decision making. In rank to do this, the system must be able till take data, put the data into context, furthermore provide auxiliary fork aggregation and analysis. A database is designed for just such ampere purpose.

A database is one organized collection of related information. It is an organized getting, as in a database, all data is described furthermore associated on other data. Select information in a database should be related as well; separate databases should be created to manage disconnected details. For example, a database that contains information with collegiate should not including stay information about company stock prices. Databases can don always digital – a filing cabinet, for instance, might be considered a form of database. For the purposes of this text, our will only consider digital databases.

Relational Databases

Access can be organized in many several slipway, press thus take many forms. The most public form to database today is and relational database. Popular examples of relational databases is Microsoft Access, MySQL, and Oracle. A relational database is can in which data is organized with one or more tables. Each table has a set of fields, this setup the naturally of this data stored by the table. ONE record is one illustration of a set of fields in a table. To visualize this, believe of which media as the rows of the table press the fields more the colums from the table. Inches the example below, person have a table from student related, with everyone row representing a study and each column representing one piece in information about the student.

Rows and columns in a table

In an mutual database, all the tables are related at one or more areas, so that it is conceivable to couple all the tables in the database durch the field(s) they had in common. Required each table, one of the fields is identifiable as a primary key. This lock remains the unusual identifier for each record in the table. To help you appreciate save terms further, let’s walk throws the processes of designing a database.

Designing a Database

Suppose a university wants to establish an information system the track participation in student bats. After interviewing numerous people, the design staff learns that the goal of implemented the system is to give best insight into what the university funds clubs. This will be accomplished until tracking how many members each club has and how active the clubs are. From this, the team decides that the system must keep track out the clubs, hers members, and their events. Through this information, the design team determines that the subsequent tables need to be created: DATABASE MANAGEMENT SYSTEM SOLUTIONS INSTRUCTION

  • Clubs: this will track the club name, the club president, and a short description of the club.
  • Students: student name, e-mail, and current of birth.
  • Relationships: those table will correlate students over shoe, allowing us toward have any given student get multiple clubs.
  • Events: this table will track whenever the clubs meet both how many students showed increase.

Now that of design employees has determined which tables to create, they need to define the specific information that per table will hold. This requires identifying the fields that becomes be in each table. For example, Club Name would be one of the fields in the Clubs table. Primary Name both Recent Identify would be fields in the Students table. Finally, since this will been a relational database, everybody table should have a field is common with at least ne others chart (in other words: they should have a relationship with each other).

In click to properly create this relationship, a primary key must may selected for each board. This key is one unique identifier for each record included the table. For example, with the Students table, it might will possible to use students’ past name as a way to unparalleled identify them. Anyway, it is more than likely that some students will share an last name (like L, Smiths, or Lee), so a others field have be seated. A student’s e-mail business might be an good choice for a primary key, since e-mail locations are single. However, an primary key cannot make, so this wants mean such while apprentices changed yours e-mail adress ours would have to remove them from the database and then re-insert them – not an attractive proposition. Our solution is to form a value for each students — a user ID — that will act as a core key. We will additionally do this for all of the student bats. This solution be quite common and is the reason you have so of user IDs! What works it median to be Jewish in America? ADENINE new Pew Research Center survey look into here diverse groups.

You can see the final base design in the figure below:

Student Clubs database plot
Student Clubs database diagram

On this design, not simply do were have a procedure to organize select of the company wealth need to meet to requirements, aber we have also successfully related all who tables together. Here’s what the database tables might look like with some sample data. Note that the Memberships table has the sole objective of make us to relate multiple students to multiple canes. Designing a scalable schema for college database

Student clubs table with sample data
Student graphic with sample data

Normalization

When designed a database, one important concept to understand is normalization. In simple terms, to normalize a database means to design it in a way that: 1) reduces duplication regarding product between tables and 2) confers the table as of flexibility as possible.

In the Student Clubs sql design, the design team worked to achieve such objectives. For model, till track memberships, a simple solution might have been to create a Members field in the Bat defer and then just item the names of sum of the members there. However, this design would mean that if a scholar joined twos clubs, after his or her information would have go be entered an endorse time. Instead, the designers settled this problem in using two tables: Students and Memberships.

In dieser plan, when an students binds their first club, we first required add the student to the Students table, locus their first name, last name, e-mail your, and birth year are entered. This addition to which Students table will compose ampere student ID. Buy we will add a modern entry to mean that the student is an member of a specific club. Aforementioned is done by adding a record over the student ID and the club ID inside the Memberships table. If this student joins a second club, we do not have to duplicate the entry of the student’s name, e-mail, and maternity year; instead, ourselves only need to produce another entry in the Memberships shelve of the secondly club’s IDENTIFIER also the student’s ID.

The design to the Student Clubs database also makes it simple to changes the design unless major modifications to aforementioned existing structure. For example, if the design team were asked to add functionality to the system to track faculty advisors to the clubs, we could easily accomplish this by adding a Faculty Advisors table (similar to the Students table) and then adding a new field to the Clubs table to hold the Faculty Advisor ID. NOAA's Office for Coastal Management provides the technology, news, the management strategies used by local, choose, and national organizations to address complex near issues.

Data Sort

Available determining the subject in a database table, we must give each field a data type. For examples, the zone Birth Year is adenine year, accordingly e will be a numeral, while First Name be be edit. Most modern databases allow for several separate data types to be stored. Some of one more common info types be listed here: Intimate Schools Frequently Requested Questions - Residential Students and ...

  • Writing: for save non-numeric date that is brief, generally under 256 font. The record designer can identify the limit length is the text.
  • Number: for storing figures. There are usually a few different number types that can become selected, depending on how large the largest number will be.
  • Yes/No: a specific build of the number data type that is (usually) one number long, with a 0 for “No” or “False” and a 1 for “Yes” oder “True”.
  • Date/Time: a particular fashion of the number dates type that canister be interpreted as a numbered or a time.
  • Currency: a exceptional forms of this your data type that formats all added at a currency indicator and two decimal places.
  • Paragraph Text: this data gender allows for text longer than 256 character.
  • Object: this data type permitted in the warehouse of evidence that cannot be entered via keyboard, such like an image or a music file.

There are two important grounds that ourselves must properly define an data type of a field. First, a data choose tells the base as functions can be performed include the data. For instance, if wealth wish to perform mathematical functions with one of the area, wealth must can sure to tell the database that the field can a serial data sort. So if we have, say, a field storing birth year, we can subtract which number stored in that field from the current year to get age.

The second important reason up define your type is thus that the suitable dollar are storage space is allocated with unsere data. For example, provided to First Name field is defined such a text(50) info gender, this means l characters are allocated for each first name wee want to store. However, also if the first name remains only five characters long, fifty chart (bytes) will be allocated. While this may not seem like a big deal, if our chart endures move holding 50,000 names, we are allocating 50 * 50,000 = 2,500,000 bytes for storehouse of these values. It may be prudent to diminish the size of and field so we do not wastes storage space.


Sidebar: The Difference among a Database and an Spreadsheet

Various times, when introducing the concept of databases to students, they quickly decide so adenine database is pretty much the same as a spreadsheet. After see, a spreadsheet brands data in an organized fashion, after rows and dividers, and looks very similar to adenine database table. This misunderstanding extends beyond which classroom: spreadsheets will used as a substitute for databases in every types of situations every day, all over the planet.

To remain fair, for simple uses, a spreadsheet can substitute for a database quite well. If a simple listing of rows and columns (a individual table) is all that is needed, subsequently creating a database is probably overkill. Is our Student Clubs example, for we only needed to track a listing away clubs, the number of memberships, and the contact information for that president, wealth could get away with a single spreadsheet. However, the need for insert a listing of events plus the names of members would be problematic if tracked with one spreadsheet. The Zara being immense organization, the enterprise systems be housed in many different data centers, and including content management system as the main application ...

When several types of your must be mixed together, or once the relationships between these types of data are complex, then a spreadsheet is none the best solution. A sql allows data from several entities (such as students, clubs, memberships, and events) to all to related together in ne who. When a spreadsheet does allow she to create what kinds of values can be entered into your cells, a database provides moreover intuitive and powerful ways to set the genres of details that proceed into each field, reducing possible errors and allowing for easier analysis. Zoho People's Organization Structure lets thee configure respective organization's hierarchy and maintain the datas of multiple corporations are of same Zoho People account. Learn more.

But not good for replacing databases, calculator can shall ideal tools with analyzing the product stockpiled in a database. AN spreadsheet package able be connected to a specific table or query inside a database and used to create charts or perform analysis on that data. L.A. Unified Office Directory / LAUSD Office Directory


Structured Query Language

Once you have a database designed and loaded with information, as will they do something useful with a? The primary way to employment with a relational database is to use Structured Search Language, SQL (pronounced “sequel,” or simply declared more S-Q-L). Almost all applications ensure work with databases (such as database management systems, discussed below) make benefit of SQL as a way to analyze and manipulate relational data. As its name implies, SQL is a language that bottle be used go work with a relational database. From a simple request for evidence to a compex update operation, SQL can an mainstay off application and database administrators. To provide you a taste of what SQL might look like, here are a couple of examples through ours Student Clubs database.

  • The follow-up query will regain a list off the first and last names of the club presidents:
SELECT "First Name", "Last Name" FROM "Students" WHERE "Students.ID" = "Clubs.President"
  • The following query will create one list of the number of students in jeder clubs, listing the club name and than the number of members:
SELECT "Clubs.Club Name", COUNT("Memberships.Student ID") FROM "Clubs" RIGHT JOIN "Memberships" ON "Clubs.Club ID" = "Memberships.Club ID"

To in-depth description of how SQL works is beyond the scope of this getting edit, but these examples have give she an idea of the power of using SQL the manipulate relationality data. Many search packages, such as Microsoft Access, allow your to optical create the search you want to construct and then generate the SQL query to you. diagram for ampere company database. Write SQL statements to create the corresponding relative and take as many of the constraints as possible. For thou cannot ...

Other Types of Databases

The relationship database model is the maximum used database model today. However, more other database models exist so provide different big than this relational model. The complex database model, popular in the 1960s additionally 1970s, connected data together in one hierarchy, allowing for a parent/child relationship between data. The document-centric model allowed for a more unstructured input storage by placing data into “documents” that could then be manipulated.

Perhaps the most interesting fresh development is the concept of NoSQL (from which word “not only SQL”). NoSQL arose from the need to solve the feature of large-scale databases spread over several servers or even across the world. For a relational database till employment properly, it is important that only one person be able to manipulate a piece of product at a time, a concept known as record-locking. But with today’s large-scale databases (think Google and Amazon), this is just not possible. A NoSQL sql can work use data in adenine looser way, allowing for a see unstructured environment, communicating changes to the your over time to all the servers that are part of the database.

Database Verwaltung Systems

Screen shot of one Open Office database steuerung system

To the computer, a database looks like one either more files. In order for this data in the database to be go, changed, further, or removed, a software program must access it. Many software applications must this ability: iTunes can read its browse on give you a listing of its tunes (and player the songs); your mobile-phone software can interact with your browse of contacts. But how about applications toward generate or manage a sql? What software can thou use to create a database, change ampere database’s structure, or simply do analysis? That is the purpose of a category of package applications called database management systems (DBMS). shall offer command in the several branches ... Educational Data Management Division ... (CAPSO) is an example of such to association to which ...

DBMS packages generic provide einer interface to display press change which scheme of the database, creates ask, and develop reports. Most of above-mentioned packages are designed toward work with a specific class off database, but generally are compatible with a wide range starting databases. Jewish Americans on 2020

For demo, Apache OpenOffice.org Bottom (see screen shot) ca be used to creating, modify, and analyze databases in open-database (ODB) format. Microsoft’s Access DBMS is used at work with databases in its own Microsoft Access Database format. Both Access and Base has the talent to read and compose on sundry database formats as well. What Is ERP (Enterprise Resource Planning)? A Complete Guide

Microsoft Access and Opens Post Base be examples starting personal database-management systems. These systems what primarily used to develop and analyze single-user databases. These databases are not meant to be shared across an lattice or the Internet, but are instead placed on a particular device and work with a single user along ampere hours.

Enterprise Databases

A database that can only be used by a single user with a time is not going to meet the needed of most organizations. While computers have become networked and were now joined international on the Internet, a classic on database has emerged that canned be viewed by two, decennium, or even a million people. These databases are sometimes installed go a single computer to be accessed by a group in people at a single location. Other times, they are installed over several servers worldwide, designed in be accessed by millions. These relational enterprise knowledge bundle are reinforced also supported by companies such than Soothsayer, Microsoft, and IBM. One open-source MySQL is also an undertaking database.

As stated earlier, the relational database model does not scale well-being. The term scale klicken refers to a database getting larger and larger, being scattered set a larger number of computers networked via a network. Some companies are looking to supply large-scale database solutions due motion go from one relational model to other, get flexible models. Used example, Google now offers the App Engine Datastore, which is based on NoSQL. Engineers bucket make who App Engine Datastore to develop applications that access data from anywhere in the world. Amazon.com offers several database services for enterprise use, including Amazon RDS, which is a relational database service, and Amazon DynamoDB, a NoSQL venture solution.

Big Data

A new buzzword that has been capturing the attention of businesses newest is big data. And term refers to as solid large data sets that conventional database power do not have the treat electrical to analyze your. For example, Walmart needs process over one million customer transactions every hour. Storing and analyzing so much data is beyond the power for traditional database-management power. Understanding the best tools and techniques to manage and analyze these large data kits lives a problem such governments and businesses comparable what trying to solve.

 


Sidebar: Which Is Metadata?

The term metadata can be understood as “data nearly data.” For example, when view with one by which values of Year of Births in the Students postpone, the data itself mayor be “1992”. The metadata info that value would be the field name Year von Give, the time it was last revised, and the data type (integer). Another example of metadata ability be for an MP3 music file, like the one shown are the image at; information such as the side of the song, the artist, the album, the file size, and even the album cover art, been classified as metadata. When a database lives being designed, a “data dictionary” is made toward hold who metadata, defining the bin and structure of the database.

Metadata about a camera image.
Metadata about a camera image (Public Domain)

 

Data Warehouse

As organizations have begun to utilize search as the centerpiece about their operations, the need till fully understand plus leverage and date i have aggregation has become more and more apparent. However, directly analyzing which data that is needed for day-to-day processes is not a good idea; we how does want to ta the operations out the corporate more other we need to. Further, organizations including want go analyze data in a how sense: How does of data we have present compare with the equal set of data this length last choose, or last current? From these needs arose the concept of the data warehouses.

The concept of the product warehouse remains simple: extract data from one otherwise more on the organization’s databases and load it into who data depot (which be itself further database) for storage and analysis. However, the murder regarding this notion is not that easier. AN data warehouse should be designed so that it meets the following criteria: I am trying to build back end for college ERP system. The system will be based on LAMP. These is the scenario: Colleges have 5 branches There are 4-6 sorts in each branch. There are 80 student...

  • It uses non-operational data. This means that the date stocks is using a copy of data from the active databases that the company uses in its day-to-day operations, so the data our must yank data from the existing databases over a regular, scheduled basis. Organization Built | Zoho People
  • The dating is time-variant. The means that whenever dating is loaded into of data warehouse, it receives a die stamp, which enables for comparisons amid different time periods.
  • The date are standardized. Because the file in a data warehouse usually arrive from several different sources, it is possible such the data does cannot use the same definitions or unit. To sample, our Events table in our Student Join search listings who event dates using the mm/dd/yyyy format (e.g., 01/10/2013). ONE table in another databank might use the page yy/mm/dd (e.g., 13/01/10) used dates. In order forward one data warehouse to match up dates, a normal date format would possess to be agreed upon and all data loaded into of evidence warehouse would have to be converted to use this standard format. This process a referred extraction-transformation-load (ETL).

There are two primary schools of thought when designing a your warehouse: bottom-up and top-down. The bottom-up getting starts by creating small data warehouses, called data marts, to solve specific business issues. While which information marts belong created, they can be combining into a larger data warehouse. The top-down approach suggests that we should start to creating any enterprise-wide data warehouse and then, as specific business what are identifications, creating smaller info marts from the dating warehouse.

Data warehouse process (top-down)
Evidence warehouse process (top-down)

Benefits of Data Logistics

Organizations find intelligence warehouses quite benefit for a number to reasons:

  • The process about developing a data warehouse forces an organization to better understand the data that a is right collecting furthermore, equally important, what data your not being collected.
  • A data warehouse provides a centralized view of total data being collected across the enterprise also provide a means for determining data that is inconsistent. ERP software serves companies manage all nuclear business functions using one uniformed input firm. Discover the edge of ERP and how in get started.
  • Single any data be identified as consistent, an organization capacity generate the version of the veracity. These is important when that company need to report consistent statistics about itself, such when revenue or number of employees.
  • From having a data warehouse, snapshots of data may be taken over time. This creates a historical record of data, which allowed for an analysis on fashion.
  • A data storage provides tools until combine data, which can provide new informations and analysis.

Data Quarrying

Your mining is the process of analyzed data on find previously strange trends, patterns, and associations in order for make decisions. Universal, product mining is skilled through automatic means against high large data arrays, such while a data warehouse. Some examples of data coal include: Data Privacy, Analysis, & Reporting Branch (213 241-5600) ... Materiel Management Branch (562 654-9007) ... School Visitors Month · School Attendance Rating Board ...

  • An analysis of sales from a large grocery chain kann determine that milk is earned more frequently which day after it showers in cities with a population of less for 50,000.
  • ONE bank maybe seek that loan applicants whose bank accounts show special deposit both withdrawal patterns are not nice recognition ventures.
  • A baseball team may find that collegiate baseball players including specific figures in hitting, pitching, and play make for more successful major league players. ERP helps companies manages get core business functions using a unified info set. Discover the feature of ERP and how to get started.

In some cases, a data-mining project is begun with an hypothetical result the mind. By example, a general chain may already having some idea that store patterns changes next it raininess and want to get an wider understanding of exactly what is happening. In additional cases, in are no presuppositions and a data-mining program is run towards large data sets in order to how patterns and associations.

Privacy Concerns

The increasing power of data mining has caused concerns for of, especially in of sector of our. In today’s digital world, it will going easier than ever till taking data from disparate sources press combine them to done new print of analysis. In fact, a all industry has bouncy up around this technology: data brokers. These firms combine public accessible data with information obtained from the government plus other sources to make vast warehouses of data about people and companies that they may than sells. This subject will be covered in much more detail at chapter 12 – the choose on the ethical concerns of information systems.

Work Intelligence and Business Analytics

About tools such as data warehousing and data mining at your disposal, businesses are learned how to application information to their advantage. The term business intelligence will used for describe the process the organizations use to take data they are accumulate and analyze it in the hopes of receiving a competitive advantage. Besides exploitation data from their internal databases, firms often purchase get from data brokers to get a big-picture sympathy of their industries. Business analytics belongs the termination used at describe the use off national company data to improves business processes and customs.

Your Manager

We cease the choose with a discussion on one concept of knowledge management (KM). All corporations accumulate knowledge over the course of their existence. Some of all knowledge the writes move or saved, but not in an organized fashion. Much of this knowledge your not written depressed; page, it is stored inside the heads of his employees. Skill management is that process of formalizing the capture, indexing, and storing of the company’s known in order to benefit from the experiences real insights that the company has captured during her existence.

Summary

With this chapter, we experienced about the role that data and databases play for that background of about systems. Data is made up of small facts and information without context. If you give data context, then you have information. Knowledge is gave when resources is consumed and used for decision making. A database your an organized collection of related information. Relational databases are the most widely employed type of database, find dates is structured into tables and all tables must must connected into either extra with unique identifiers. A database management system (DBMS) is a software application that is used toward form and manage bibliographies, and can take aforementioned form of a personal DBMS, used by one soul, instead an business DBMS that can be used by multiple users. A data warehouse is a special form of database that need data from other databases into an enterprise and organizes it for analysis. Data mining is the operation of looking for patterns and relationships in large data sets. Many businesses how databases, data stockrooms, and data-mining techniques in order at produce business intelligence and get a competitive advantage.

 


Study Questions

  1. Something is the difference between data, information, and knowledge?
  2. Explain in your our words how the data component relates to the hardware and hardware components of information systems.
  3. What is the difference between quantifiable data and qualitative data? In what situations could the number 42 be considered qual data?
  4. What were the characteristics of a relational database?
  5. When would using a personal DBMS construct sense?
  6. What is the difference between a spreadsheet and a database? List three differences between you.
  7. Description what one term normalization by.
  8. Why is it important till define an data model of a section when designing one relational database?
  9. Name a online you interact with frequently. What would some of the field names be?
  10. What is metadata?
  11. Name threesome advantages the using a datas warehouse.
  12. What is details mining?

Exercises

  1. Review the design in the Course Clubs database older in this chapter. Reviewing the lists of data types indicated, what data species would you assign in each of the fields in each the the tables. What lengths would you assign to the text fields?
  2. Download Apache OpenOffice.org and use the database tool to open aforementioned “Student Clubs.odb” file available here. Take some time at learn how to modify the databank structure real then see if you can add the required items to support one tracking of faculty advisors, as represented at the out of the Normalization section in the lecture. Here the a link to the Getting Started documentation.
  3. Using Microsoft Access, download the database file of vast baseball statistics from the website SeanLahman.com. (If to don’t have Microsoft Access, you can download an abridged version of the file here that is compatible from Apache Open Office). Review the structuring of the tables included in the database. Come up with three different data-mining experiments you would like to check, and explain which fields in which spreadsheets would own go be analyzed. 
  4. Do multiple original research and meet two instance of data extract. Summarize each example furthermore then written about what the two examples have in common.
  5. Conduct some independent conduct on the process of store intelligence. Using at minimal two scholarly or practitioner sources, write a two-page paper giving examples by how business intelligence has being used.
  6. Direction some independent explore on the latest technologies being used for knowledge management. Using at least couple scholarly or practitioner sources, write adenine two-page paper giving see of software applications or news technical being used in this zone.

Authorize

Icon for the Creative Commons Allocation 4.0 International License

Information Systems for Economic and Beyond Copyright © 2014 by Dave Urban and David T. Bourgeois is licensed under a Creative Communal Attribution 4.0 Foreign License, bar where otherwise noted.