Semantic Web Architecture and Applications
First Generation - Keywords
- Applications: Keyword tools are appropriate for creating a word location or a list of documents which contain specific defined keywords and root stems. These are not capable of understanding similar words, the meanings of words, or their relationships, or context.
- Problems: False negatives (no matches found because the word or stem are not exactly identical: “big” and “large”) ; False positives (too many unrelated matches found because a root stem finds many unrelated words: “process” and “processor” ; Scale factors (keyword search tool produce very long random lists of documents if the source database is large, and the relevance rankings are highly misleading.
- Examples: The most common examples of key word tools are web site “Search” tools and the Microsoft “Find” function (control “f” key) in Microsoft Office applications.
Second Generation - Statistical Forecasting
- Applications: Statistical forecasting tools are appropriate for performing simple document searches where the desired output is a list of documents which contain specific words which must then be read and classified and summarized manually by end users. These are not capable of understanding the meaning or context or relationships of documents.
- Problems: Keyword limitations of false positives and false negatives ; Misunderstanding the meaning of words and sentences (“man bites dog” is the same as “dog bites man”) ; Lack of context: “Duke” could be Duke of Windsor or Duke of Earl or John Wayne ; Scale factors: a single statistical relevance ranking creates huge “Google” lists of many irrelevant documents.(“you have 100,000 hits”).
- Examples: The most common statistical forecasting tool is “Google” and many other tools using inference theory and similar analysis and predictive algorithms.
Third Generation - Natural Language Processing
- Applications: Natural language tools are appropriate for linguistic research and word-for-word translation applications where the desired output is a linguistic definition or a translation. These are not capable of understanding the meaning or context of sentences in documents, or integrating information within a database.
- Problems: Keyword limitations of false positives and false negatives ; Misunderstanding the context (does “I like java” mean an island in Indonesia, a computer programming language or coffee?) Without understanding the broader context, a linguistic tool only has a dictionary definition of “Java” and does not know which “Java” is relevant or what other data related to a specific “Java” concept.
- Examples: The most common natural language tools are translator programs which use dictionary look up tables to convert words and language-specific grammar to convert source to target languages.
Fourth Generation – Semantic Web Architecture and Applications
Semantic web architecture and applications are a dramatic departure from earlier database and applications generations. Semantic processing includes these earlier statistical and natural langue techniques, and enhances these with semantic processing tools. First, Semantic Web architecture is the automated conversion and storage of unstructured text sources in a semantic web database. Second, Semantic Web applications automatically extract and process the concepts and context in the database in a range of highly flexible tools.
a. Architecture; not only Application
a. Architecture; not only Application
First, the Semantic web is a complete database architecture, not only an application program. Semantic web architecture combines a two-step process. First, a Semantic Web database is created from unstructured text documents. And, then Semantic Web applications run on the Semantic Web database; not the original source documents.
The Semantic Web architecture is created by first converting text files to XML and then analyzing these with a semantic processor. This process understands the meaning of the words and grammar of the sentence, and also the semantic relationships of the context. These meanings and relationships are then stored in a Semantic web database. The Semantic Web is similar to the schematic logic of an electronic device or the DNA of a living organism. It contains all of the logical content AND context of the original source. And, it links each word and concept back to the original document.
b. Structured and Unstructured Data
b. Structured and Unstructured Data
Second, Semantic Web architecture and applications handle both structured and unstructured data. Structured data is stored in relational databases with static classification systems, and also in discrete documents. These databases and documents can be processed and converted to Semantic Web databases, and then processed with unstrctured data.
c. Dynamic and Automatic; not Static and Manual
c. Dynamic and Automatic; not Static and Manual
Third, Semantic Web database architecture is dynamic and automated. Each new document which is analyzed, extracted and stored in the Semantic Web expands the logical relationships in all earlier documents. These expanding logical relationships increase the understanding of content and context in each document, and the entire database. The Semantic Web conversion process is automated. No human action is required for maintaining a taxonomy, meta data tagging or classification. The semantic database is constantly updated and more accurate.
d. From Machine Readable to Machine Understandable
d. From Machine Readable to Machine Understandable
Fourth, Semantic Web architecture and applications support both human and machine intelligence systems. Humans can use Semantic Web applications on a manual basis, and improve the efficiency of search, summary, analysis and reporting tasks. Machines can also use Semantic Web applications to perform tasks that humans cannot do; because of the cost, speed, accuracy, complexity and scale of the tasks.
e. Synthetic vs Artificial Intelligence:
e. Synthetic vs Artificial Intelligence:
Semantic Web technology is NOT “Artificial Intelligence”. AI was a mythical marketing goal to create “thinking” machines. The Semantic Web supports a much more limited and realistic goal. This is “Synthetic Intelligence”. The concepts and relationships stored in the Semantic Web database are “synthesized”, or brought together and integrated, to automatically create a new summary, analysis, report, email, alert; or launch another machine application. The goal of Synthetic Intelligence information systems is bringing together all information sources and user knowledge, and synthesizing these in global networks.
Future of Information Management: Network Spread Sheets for Ideas
The future of information management will be based on Semantic Web architecture and applications. The most important issue is which technologies and firms take the immediate leadership to drive the migration, and therefore guide the information architecture of the future.
1. Tidal Wave of Information Shifts Power
Future of Information Management: Network Spread Sheets for Ideas
The future of information management will be based on Semantic Web architecture and applications. The most important issue is which technologies and firms take the immediate leadership to drive the migration, and therefore guide the information architecture of the future.
1. Tidal Wave of Information Shifts Power
End users and corporations will drive the rapid expansion of Semantic Web architecture and applications to survive the tidal wave of data, and improve costs, speed and performance. IT management will resist or accelerate this trend. Information power will shift from the database managers back to the departments and end users; as the PC + spread sheet did in 1981-1984.
2. Migration to XMLand RDF Standards
2. Migration to XMLand RDF Standards
Applications programs will follow Microsoft’s migration to XML standards for document authoring and exchange. XML and RDF standards will become the dominant approach for capturing, understanding, storing and exchanging external document descriptions and document content. Unstructured Text Documents become Synthetic Expert Networks.
3. Universal Internet Web Portals
3. Universal Internet Web Portals
Information access will migrate to web portals within organizations and with the general population; and web portals based on Semantic Web applications will become the central user application. Operating systems and legacy applications will become transparent under semantic web portals with highly flexible applications: Network spead sheets for ideas.
4. Parallel Legacy Database Integration
4. Parallel Legacy Database Integration
Legacy databases will be extracted into parallel Semantic Web architecture databases to provide access to fragmented sources. Parallel architecture dramatically reduces the costs, risks, and schedules from the ERP “tear down and rebuild” Transparent Grid Architecture.
5. Global and Language Expansion
5. Global and Language Expansion
Information sources, users and entities will expand globally and support many languages. Because Semantic Web architectures and applications “learn and think” in the original language, the production and exchange of multi-language information between language domains will increase dramatically.Interactive Japanese language sources on China in English.
6. Network Access and Distribution
6. Network Access and Distribution
Networks will get better, faster, cheaper, wireless and distributed. Semantic Web architecture and applications will expand to link global data sources from mainframe servers, desktop workstation and laptops, to hand held PDA and cell phones. Voice driven expert systems.
7. Network Transactions and Capacity
7. Network Transactions and Capacity
Human transactions will grow slowly; and machine transactions will grow exponentially. The migration from man to machine intelligence transactions will rapidly take over the private and public networks. This rapid capacity demand will force a major increase in network hardware investment and stimulate new value added network services. Japan DoKoMo mobile network.
SEMANTIC WEB TECHNOLOGIES APPLIED TO BUILDING SPECIFICATIONS
SEMANTIC WEB TECHNOLOGIES APPLIED TO BUILDING SPECIFICATIONS
Reinout van Rees and Frits Tolman
Abstract
There are three important aspects of building specifications:
Abstract
There are three important aspects of building specifications:
- First the content of the specifications. What information is contained in a specification?
- Second, the goal of specifications. What does a specification intent to achieve, what are the fields of application?
- Third, the interaction points with the environment of a specification.
The information in a specification is used by other applications. Other applications also provide information to the specification. These interactions could benefit from a more semantic link.
An ontology is a formal way of describing the set of concepts used in a certain field, from a certain viewpoint. Ontologies and the information that uses them can be accessed and exchanged in the familiar open and standardised way using the Internet. This semantic web allows you to make explicit statements and explicit links about (and in) Internet-accessible resources using ontologies as loosely-coupled, expandable vocabularies. This greatly enhances the semantic richness of Internet-based information exchange.
Introduction
Introduction
The objective of the research discussed in this paper is to look into the future of the building and construction industry from the perspective of specifications. Current textual specifications will be replaced by “eSpecs”, which will be accessible both by humans and to computers. This is done (1) by applying XML technology to separate the specifications content from its mark up, and (2) by expressing the content using the terms and structure made explicit in an Internet-based ontology (“set of definitions and terms”). The research focuses on the best way to create eSpecs, and on the different ways eSpecs may be used.
Nature of building specifications
Nature of building specifications
A building specification is a central document in a building process. It, traditionally, sits between the design phase and the actual construction phase. A specification consists of both the specification drawings and the specification text. This paper mainly focusses on the content of a specification text.
Content of specifications
Content of specifications
An essential first point regarding specifications is that it is a specification, not an explanation. That is, it is meant to be specific descriptive or prescriptive, not some loose indication.
Essential properties of the specification text are:
Essential properties of the specification text are:
· The formal description.
· (References to) conditions and regulations.
· A classification.
· References to the specification drawings.
Formal description
The formal specification is build-up from a list of specification items. Historically, one specification item often deals with something that has to be budgeted. For such an item, for instance the required end result, the required quality and the source material is described.
The data behind the specification items would be the natural domain of product modelling. This data would greatly benefit from a good coupling with the drawings.
Conditions
Conditions (or regulations) give extra information on top of the plain technical data (like fire resistance = 30min). Conditions can be technical or administrative and standard or additional.
Formal description
The formal specification is build-up from a list of specification items. Historically, one specification item often deals with something that has to be budgeted. For such an item, for instance the required end result, the required quality and the source material is described.
The data behind the specification items would be the natural domain of product modelling. This data would greatly benefit from a good coupling with the drawings.
Conditions
Conditions (or regulations) give extra information on top of the plain technical data (like fire resistance = 30min). Conditions can be technical or administrative and standard or additional.
The standard (technical or administrative) conditions are typically valid for every building project. Standard administrative texts make sure that contract-wise a lot of commonly used safeguards are included. The correct terminology is used to invoke protection under certain laws. This way, what needs to be said is said simply by including it automatically in the specification. Typically, the standard conditions are available pre-printed in book form and simply included with the rest of the specification.
Specification structure: classification
One common way of subdividing the textual specification is subdivision into parts called chapters. Traditionally the chapters often correspond with branches of the industry or kinds of work. All the paintwork is in one chapter, the groundwork in another and the doors & windows in a third. This makes it easier to provide a cost estimation by allowing the different experts to estimate their part. This kind of subdivision is common in the housing and utility construction section, traditionally subdivided in specific crafts.
A second common way of subdividing is by following normal execution patterns. The reason for this is that detailed cost estimations (in the ground/water/road sector) are normally made that way. A good match between the cost estimation and the specifi¬cation text is desired.
References to the specification drawings
Normally, the references to the accompanying specification drawings are not extensive. The “doors on the ground floor” are described. Also you can describe a set of doors, mentioning “placement according to drawing”. These references are all textual.
A better coupling with the drawings will be a big advantage to the building industry. There are some possibilities to generate a partial specification from well-executed drawings, but even then there is no real two-way link.
One common way of subdividing the textual specification is subdivision into parts called chapters. Traditionally the chapters often correspond with branches of the industry or kinds of work. All the paintwork is in one chapter, the groundwork in another and the doors & windows in a third. This makes it easier to provide a cost estimation by allowing the different experts to estimate their part. This kind of subdivision is common in the housing and utility construction section, traditionally subdivided in specific crafts.
A second common way of subdividing is by following normal execution patterns. The reason for this is that detailed cost estimations (in the ground/water/road sector) are normally made that way. A good match between the cost estimation and the specifi¬cation text is desired.
References to the specification drawings
Normally, the references to the accompanying specification drawings are not extensive. The “doors on the ground floor” are described. Also you can describe a set of doors, mentioning “placement according to drawing”. These references are all textual.
A better coupling with the drawings will be a big advantage to the building industry. There are some possibilities to generate a partial specification from well-executed drawings, but even then there is no real two-way link.
Nature of the semantic web
The web allows us to access a vast hoard of information. You search in google almost before you ask a colleague for information , so the web is already firmly in place. The semantic web is a set of technologies that allows computer programs an equivalent richness of information.
Related research in the building industry
The goal of eConstruct (http://www.econstruct.org/) was to harness the possibilities of the Internet for the building industry, concentrating on the communication in the buying and selling phase. Conceptually, three things are needed for communication: a vocabulary, a grammar and a communication medium (Van Rees et al. 2001).
The web allows us to access a vast hoard of information. You search in google almost before you ask a colleague for information , so the web is already firmly in place. The semantic web is a set of technologies that allows computer programs an equivalent richness of information.
Related research in the building industry
The goal of eConstruct (http://www.econstruct.org/) was to harness the possibilities of the Internet for the building industry, concentrating on the communication in the buying and selling phase. Conceptually, three things are needed for communication: a vocabulary, a grammar and a communication medium (Van Rees et al. 2001).
- A taxonomy (sporting a specialisation hierarchy, property definitions and multi-linguality) was used as the vocabulary of terms. The iso/dis 12006-3 developments were used for this.
- The grammar (data format) was bcxml, a custom xml format. Basically it used the terms of the vocabulary, allowing for an intuitive and human-friendly <Window height=”2.40” unit=”m”/>-like language.
- The communication medium was the Internet, used to connect a few services (catalogue server, taxonomy server, etc.).
E-cognos (http://www.e-cognos.org/) started the moment eConstruct finished and took the development into the direction of knowledge management. Harnessing the existing and available, but not well-findable, knowledge contained in documents and in people.
- Multiple cooperating ontologies (footnote: e-cognos used the term ontology instead of taxonomy; they stressed most the specialisation hierarchy and the rich functionality for synonyms etc.) provided multiple cooperative ways to access and find and classify information.
- Data was exchanged in xml (partly re-using bcxml) and in rdf (combined with daml+oil), which is an xml format for ontologies and ontology-based data.
- The Internet was, like in eConstruct, used to access the ontologies' information. But the big innovation was to add the information richness allowed by the ontologies onto existing information contained in document management systems and employee databases. Superimposing ontological richness onto existing systems proved possible.
Both projects achieved good results, allowing us to suggest the following as best practice:
- Store definitions of terms, vocabularies, etc. in widely accessible ontologies. This way, the terminology used is made explicit. Explicit is better than implicit.
- Use xml, or the more specific rdf, for information exchange.
- Use the internet as the basic communication medium.
Webify data
Webifying data means that every piece of useful data should have a URI. The success of the World Wide Web is entirely based on assigning a URI to every single webpage and image and enabling links between them (Prescod 2002).
Webifying data means that every piece of useful data should have a URI. The success of the World Wide Web is entirely based on assigning a URI to every single webpage and image and enabling links between them (Prescod 2002).
Ontology language
Webifying data is the first necessary step to enabling the semantic web for the building and construction industry.
Webifying data is the first necessary step to enabling the semantic web for the building and construction industry.
A second step is by using a standard data format for shareable ontologies (http://www.w3.org/2001/sw/): RDF (and its more powerful add-on, OWL). OWL provides us with a way of dealing with:
- Classes and properties and their relations.
- Subtype hierarchy (both for classes and for properties).
- Textual information (labels and descriptions, multilingual).
- Re-using classes and properties from other ontologies, allowing you to build on previous work and to use more generic high-level ontologies as a common basis for two ontologies that need to exchange information.
Implementation notes: Zope/Python.
- When implementing a semantic web solution, two main components have to be available: A web application server, providing a web server and a programmatic framework to drive it. A popular choice in the research community seems to be apache’s tomcat java web application server (http://jakarta.apache.org/tomcat/).
- A semantic data store, providing a means to store and query RDF files. A popular choice is Hewlett-Packard’s jena (http://www.hpl.hp.com/semweb/jena.htm).
The main goal is to store and query RDF and to provide an internet user interface which interacts programmatically with the rdf store.
Development speed and ease-of-use
Development speed and ease-of-use
Python and Zope are attractive for web programming. Python (http://python.org/) is a high level (scripting) language which is regarded by most as both elegant and powerfull, suitable for programs both big and small. It is platform-independent (windows, unix, mac; recent versions of mac OSX even ship it as part of the operating system).
Zope (http://zope.org/) is a web application server (written in Python) with a lot of built-in extra's:
· Built-in object database.
· User management and flexible password protection.
· Through-the-web management interface. No need for changing files on the filesystem.
Reusable modules
Both Python and Zope have a big community that creates a lot of add-ons and modules that - most of them are open source - can be freely reused ("free" meaning both freedom to change and re-distribute and free of charge). There are two main modules that form the basis of the implementation of this research.
Reusable modules
Both Python and Zope have a big community that creates a lot of add-ons and modules that - most of them are open source - can be freely reused ("free" meaning both freedom to change and re-distribute and free of charge). There are two main modules that form the basis of the implementation of this research.
- Rdflib (http://rdflib.net/). A simple rdf store that parses, stores, queries and exports rdf files. To store and query big data sets you can use Zope’s object database that can handle big data sets efficiently.
- Plone (http://plone.org/).
An attractive (but changeable) user interface on top of Zope’s. With little effort a great result can be obtained (ideal for a time-strapped researcher). Recently, the possibility to generate web forms from UML diagrams added even more attractiveness to this solution.
We created a version of rdflib that could be used within Zope and Plone, allowing us to quickly develop an attractive web-based user interface to an rdf model.
Architecture
Architecture
Basic property of the architecture is to cater for exchange of information between different sources of information, each with its own goal, its own methods, its own peculiarities.
Ontologies
An ontology is a formal way of describing the set of concepts used in a certain field, from a certain viewpoint.
This means that, to describe the field of specifications and the terms used therein, a specification ontology could capture the concepts used to create specifications (chapters, specification units, regulation references), but also the concepts that form the actual contents of the specification (masonry, double glazed windows). Likewise for a cost estimation ontology. Or an ontology that makes explicit the terminology for creating window frames.
Also there could be a generic ontology (probably multiple) that describes a reasonable amount of generic terms. Above example of the specification ontology and the cost estimation ontology shows that the same field (buildings for instance) can be described from two different viewpoints. Doing partly the same work twice (and probably not-too-compatible) could be prevented by using a joint, generic, ontology for the parts that overlap. A generic ontology could for instance include windows, but not double glazed windows. Existing classification systems could fill part of the bill, though adaption to the newer possibilities of ontologies might spark an effort to create new versions.
A generic ontology could be made more specific by branch-specific or application-specific ontologies. Application ontologies add the concepts needed for cost estimation, for instance, or for fire safety calculations. Branch ontologies further specify and add concepts from their branch of the construction industry.
By cooperating and re-using work already done in other ontologies, a web of ontologies (or ontology web) can come into being. OWL (the web ontology language, building on RDF) has built-in support for cooperating ontologies.
Information sources
When looking at a building project, you can distinguish four kinds of information in two dimensions. The first dimension is whether the information is specific to the project or not. The second dimension is whether the information is specific to a certain company or not.
The proprietary information and the project information overlap partly. This is an area where problems loom. Part of the proprietary information could be very handy for the project and therefore for the project partners. But do you want to give them that valuable information for free? Do you want to share it? It might be a competetive advantage in later projects, which you give away by sharing.
Also, part of the project information will be necessary for the company. The specification drawings, the specification text, etc. Essential for coupling with the internal information like work planning. But the project information has to be available to all partners in the project. This therefore means that the information has to be shared, so it either has to be kept in one location or different copies have to be kept synchronised.
The essential property here is that the information has to be shared between the quadrants. The information should at least be partly accessible.
Example
As an example, let us take the project description, as made by the initiator of the project. When taking the semantic web route, this should be available over the Internet and its concepts should be described in an ontology. It includes links to the textual specification, for instance.
Secondly, we take internal company data. A set of recipes on a generic level, containing data on how proficient the company is in executing certain projects. These recipes also include the information needed to automatically calculate an initial value for the cost associated with such a project.
In the third step, a company-internal web crawler visits well-known sites which list project that are currently waiting for a contractor. It downloads the project descriptions and, by using the information in the project ontology and in the recipe ontology, tries to figure out the attractiveness of the projects and calculates a first-cut estimate of the project costs.
Conclusion
Conclusion
For many years building specifications (especially the textual specification) have had a central place in the building process, traditionally situated between the design phase and the actual construction phase. Specifications have links with other documents in the building process like regulations and calculations. The links are, however, mostly only human-readable links. There is very little a computer can do automatically. Specifications would gain from a real semantic link.
To be able to create and apply eSpecs, it is of course necessary to use the Internet as the communication medium. Secondly, the definitions of terms, vocabularies, etcetera have to be made explicit in accessible ontologies. Multiple cooperating ontologies can form an ontology web. These ontologies and the data that uses them ideally should be communicated using XML and RDF/OWL over the Internet: the semantic web.
The example illustrates that the semantic web can provide the means by which the building specification can gain real semantic links to other documents and programs and vice versa. Also it shows that open source software is well-suited to this kind of task. The semantic web helps building specifications to become “eSpecs” and to re-assert their role as a central building document.






0 komentar:
Posting Komentar