Student projects
Applications that may be required to read some of these documents:
- PDF files: E.g. Adobe Acrobat Reader
- OpenOffice Impress files: E.g. OpenOffice
- Python scripts: Python Software
- Compressed files: E.g. 7-Zip
Student projects 2006
Browser based Web Accessibility Measurement Component
Group members:Li wenjie Yang guang Ignacio Alonso
Supervisor:Morten Goodwin Olsen
The goal of this project, is to demonstrate the concept of a web browser based measurement component using Mozilla[Moz]. Mozilla is interesting as a container for WAM measurement systems due to Mozillas Accessibility architecture[MOZ-A] which is used by 3rd party software like screen readers, magnifiers, and voice dictation software, which need information about document content, UI controls and events like changes of focus. Mozilla supports two accessibility APIs: MSAA on Windows[MSAA] and ATK on Linux and Unix[ATK].
Mozilla has also implemented an accessibility model into the browser consisting of transformations from a non-accessiblie DOM tree to an accessible DOM tree.
The hypothesis is that a browser based measurement component should simplify some accessibility measurements considerably, like identifying screen flickering, redirection, JavaScript links or spawned windows, by subscribing to events via the accessibility interface or the DOM tree.
The measurement component must conform to the web services based plug-in interface for the EIAO Web Accessibility Observatory, so that the measurements can be used for large scale accessibility assessments by the Observatory.
Suggested reading is Rapid Application Development with Mozilla by Bruce Perens[MOZ-Book].
[First Presentaion as MPG|Final Report as PDF|Final Presentation as MPG|Final Presentation as PDF]
Web Content Mining
Group members:Sigbjørn Tvedt, Christian Kroken
Supervisor:Morten Goodwin Olsen
The goal of this project is to create a crawler/classifier that downloads the images in a web page and tries to classify the content of each image into different categories, e.g., mathematical formula, logo, buttons, and so on. The focus should be on automatic detection of image usage that reduces the accessibility of a web page.
This task would be an extension of previous projects.
[First Presentaion as MPG|Final Report as PDF|Source Code as RAR|Source code with test images as RAR|Final Presentation as MPG|Final Presentation as PDF]
Accessibility scorecard in GIS system
Group members:Santhakumar Chanrasekaram, Ni Chen, Mats Oustad
Supervisor:Morten Goodwin Olsen
The goal of the project is to create a Geograpichal Information System based on an Open Source GIS module that can present accessibility measures as colour coded scorecards according to the scorecards defined in UWEM for the EU NUTS regions. Data to be presented may consist of HTML deviations or other data that will be extracted from the EIAO RDF repository, or possibly the EIAO datawarehouse.
[First Presentaion as MPG|Final Report as PDF|Final Presentation as MPG|Final Presentation as PDF]
Temporal web structure mining
Group members:Sølve Oppheim, Bjørn Roalkvam
Supervisor:Morten Goodwin Olsen
The goal of the project is to do an analysis of the change, growth, accessibility and interlinking of websites over time, by using The WayBack machine in combination with a web crawler. The case study should include AUC's home page.
[First Presentaion as MPG|Final Report as PDF|Final Presentation as MPG|Final Presentation as PDF]
e-Content accessibility for dyslexic people
Group member:Zhang Xiaorui
Supervisor: Annika Nietzo
Develop a WAM (Web Accessibility Metric) that can be used within EIAO
(European Internet Accessibility Observatory) to produce special scores
about accessibility for people with reading difficulties.
The new metric can be deduced from the literature that describes the
special requirements for content and layout / presentation.
[First Presentaion as MPG|Final Report as PDF|Source Code as ZIP|Final Presentation as MPG|Final Presentation as PDF]
Resource allocation algorithm
Group member:Zhu Lida and Yuan Jun
Supervisor:Noureddine Bouhmala
Data clustering is one of the common techniques in dataming. In this project, a multilevel schema is used for K-clustering problems.
Multilevel techniques refer to the process of diving large and difficlut problem into smaller ones, which are hopefully much easier to handle, and then work backward towards the solution of the original problem, using a solution from a previous level as a starting solution at the next level. In this project, we introduce a combination of the multilevel paradigm with a popular algorithm used to solve the clustering problem. Large random data sets will be generated in order to judge the quality of the clustering.
[First Presentaion as MPG|Final Report as PDF|Final Presentation as MPG|Final Presentation as PDF]
A multilevel Local Search Method to K-klustering
Group members:Xiong Wen, Wang Wenjuan, Huang Jiaquan
Supervisor: Noureddine Bouhmala
Data clustering is one of the common techniques in dataming. In this project, a new appraoch combining a new local search method and the multilevel paradigm is introduced for solving the k-clustering mproblem. Multilevel techniques refer to the process of diving large and difficlut problem into smaller ones, which are hopefully much easier to handle, and then work backward towards the solution of the original problem, using a solution from a previous level as a starting solution at the next level. The proposed appraoch starts by coarsening the original problem into a sequence of smaller problems using coarsening scheme. Thereafter a solution to the K-clustering problem is determined at the smallest problem and is projected back to the original problems by going through a refinement pahse using local search at the each intermediate level. Large random data sets will be generated in order to judge the quality of the clustering.
[First Presentaion as MPG|Final Report as PDF|Source Code as ZIP|Final Presentation as MPG|Final Presentation as PDF]
Classification of web-based discussions using Naive Bayes.
Group Member:Ekaterina Soukhikh
Supervisor:Morten Goodwin Olsen and Leiming Chen
External contact: Aleksander M. Stensby
Given a set of web-based discussions on various topics written in various languages, the classification problem consists of determining for each discussion (and its sub-posts) on what topic these discussions report on,and in what language they are written in. In this project the students are to investigate whether the Naive Bayes algorithm is applicable to classifying web-based discussions. The students will be given a training-set of articles and a large corpus of articles that they are to investigate on. The project will be performed in cooperation with Integrasco A/S.
[First Presentaion as MPG|Final Report as PDF|First Testdata as PDF|Second Testdata as PDF|Third Testdata as PDF|Final Presentation as MPG|Final Presentation as PDF]
Distributed resource allocation algorithm
Group Member:Trabelsi Walid
Supervisors:Leiming Chen and Morten Goodwin Olsen
Resources can be allocated everywhere, but to get the most optimal
positions for all of them and waiting to be used in best efficiency,
A proposed solution is to use OMA algorithm. The students will attempt to solve the problem, and the resources may have different value depending on its position. The standard for evaluating whether the resource object is the most optimal situation can be different, such like time,distance and so on. Using the Object Migration Automaton (OMA) towards partitioning the resource objects to receive the most viable solution seems like a viable approach.
This task is related to Time Weighted Object Migration Automaton.
[First Presentaion as MPG|Final Report as PDF|Source Code as RAR|Final Presentation as MPG|Final Presentation as PDF]
Solving the Bin packing problem
Group Member:Anis Yazidi
Supervisor:Morten Goodwin Olsen
Evaluating a data analysis solution of distribution of values for the bin packing problem where the values of the Objects cannot be know before they are instantiated. The bin packing problem is defined in [BINPACK] as where "Objects of different volumes must be packed into a finite number of bins of capacity in a way that minimizes the number of bins used."
This analysis can be seen as a formal examining any resource allocation problem such as a distributed crawler.In this project a solution of competitive game of learning automata should be the main focus of the distribution of the objects.
Read more about the Bin Packing Problem.
[First Presentaion as MPG|Final Report as PDF|Source Code as ZIP|Final Presentation as MPG|Final Presentation as PDF]
Detection of Denial of Service attacks using a naive Bayesian classifier
Supervisor: Morten Goodwin Olsen
Group memebers:Richard Imenes ,Åsmund Myklevoll and Kristen Gravelseter
This project is a proof -of-concept of self –learning intrusion detection
systems. The project goal is to prove that it is possible to make this using a Naïve
Bayesian classifier. Through the project we have tested a dataset using Orange, and seen
what kind of classification results we can get. This project does not contribute with any
new knowledge in the field of self-learning intrusion detection systems, but has acted as
part of our education and has given us more knowledge about the subject.
[First Presentaion as MPG|Final Report as PDF|Source Code as ZIP|Final Presentation as MPG|Final Presentation as PDF]
Accessible and Usable Web Content
Group memeber:Iker H. Garcia
Supervisor: Morten Goodwin Olsen
We search for the connection between usability and accessibility. This
project tries to find out how the adoption of w3c recommendations about
the correct way to make web content accessible to people with disabilities
can affect people with no disabilities. This information will be useful to
obtain conclusions about if it's possible to get a friendly use webpage for both kinds of people at the same time. If we can find out this relationship we should have a powerful tool in the future for improve the quality of
web content.
[First Presentaion as MPG|Final Report as PDF|Final Annexe1 as PDF|Final Annexe2 as PDF|Final Presentation as MPG|Final Presentation as PDF]
Student projects 2005
Group 1: Sampling Frequency Tuning Tool
Group Members: Wu Yang, Jin Qi and Sun Wei
Introduction: The goal of this project is to find ways to optimise the crawler frequency for individual web sites. The idea is to avoid crawling a site in case the accessibility to the site has not been changed. This is a challenge for all search engines to focus the resources on actual changes. Another relevant aspect of sampling is to select a significant and representative set of sites.
Presentation: First presentations
Project report: Sampling frequency tuning tool
Project web site: Sampling frequency tuning tool web site
Group 2: Rule-based Adaptive Query-by-example - a Learning Automata Approach
Group Members: Fang Chen and Lei Liang
Introduction: Query by Example (QBE) is a method of query creation that allows the user to search for documents based on an example in the form of a string, a single article, or a list of articles. In this project the students are to evaluate a recent novel learning automata scheme proposed by Granmo et. al for creating queries from terms found in the example text. However, in contrast to traditional QBE systems, the formulation of the query should not only be based on the example text, but should also be adapted to the queried articles. The aim is to produce more effective queries. At the heart of the scheme is a game between cooperative learning automata. The purpose of the game is to adaptively form Boolean expressions over terms found in the example text, until the Boolean expression encompasses a predefined fraction of the queried articles. The investigation will be based on a large corpus of news articles from the company InterMedium.
Presentation: First presentations
Project report: Adaptive Query-by-example
Project web site: Rule-based Adaptive Query-by-example web site
Group 3: Object Migration Automaton (OMA) for Topic Detection and Tracking
Group Members: Ole-Alexander Moy, Trond Abusdal, Karl Syvert Løland, and Alexander Mølsæther Stensby
Introduction: In this project the students are to investigate whether a variant of the Object Migration Automaton (OMA) [Oommen and Ma, 1988] can be used for TDT. Each automaton object will be associated with an article, and a probabilistic article similarity function will be used to compare articles. The OMA seems particularly promising for TDT because it (1) learns incrementally/on-line, (2) handles noise, and (3) has low computational complexity. The investigation will be based on a large corpus of news articles from the company InterMedium.
Presentation: First presentations
Project report: Object Migration Automaton (OMA) for Topic Detection and Tracking
Project web site: Object Migration Automaton (OMA) for Topic Detection and Tracking web site
Group 4: Rule Based Network Anomaly Detection using Learning Automata
Group Members: Ali Chelli, Farouk Dhahbi and Chen Leiming
Introduction: In this project the students are to evaluate a recent novel learning automata scheme for anomaly detection in computer networks proposed by Granmo et. al. At the heart of the scheme is a game between cooperative learning automata. The purpose of the game is to form a Boolean expression over packet bits that can be used to decide whether a packet is normal or not. Our general aim is to combine the benefits of stochastic traffic models (e.g., handles noise and supports on-line learning/adaptation) with the benefits of rule based models (e.g., facilitate human interpretation/verification).
Presentation: First presentations
Project report: Rule Based Network Anomaly Detection using Learning Automata
Rule Based Network Anomaly Detection using Learning Automata web site
Group 5: Web Content Mining
Group Members: Zheng Xianghan, Hu Wen and Zhang Li
Introduction: The goal of this project is to create a crawler/classifier that downloads the images in a web page and tries to classify the content of each image into different categories, e.g., mathematical formula, logo, buttons, and so on. The focus should be on automatic detection of image usage that reduces the accessibility of a web page.
Presentation: First presentations
Project report: Web Content Mining
Web Content Mining web site
Group 6: Document Classification
Group Members: Yao Fei and Yang Kun
Introdcution: The goal of the project was to make a program to decide language and topic for texts based on the text examples. The program was developed using python and mysql, the algorithm used was naive bayes.
Presentation: First presentations
Project report: Document Classification
Document classification web site
Group 7: Web Structure Mining
Group Members: Ingelin Fivelstad Isfeldt
Introduction: The goal of this project is to construct a crawler that identifies the interpage structure of a web site. The interpage structure can be modeled e.g. as a graph. From such a graph, different quantitative measurements are to be made: e.g., link density, number of cycles in the graph, average length of paths in the graph (after cycles have been removed), and so on. Finally, it should be determined whether the navigability of a web site can be ranked meaningfully based on such measurements.
Presentation: First presentations
Project report: Web Structure Mining
Project web site: Web Structure Mining web site
Group 8: Browser based Web accessibility measurement component
Group Members: Dag Tommy Sten and Katja Suhih
Introduction: The goal of this project, is to demonstrate the concept of a web browser based measurement component using Mozilla[Moz]. Mozilla is interesting as a container for WAM measurement systems due to Mozillas Accessibility architecture[MOZ-A] which is used by 3rd party software like screen readers, magnifiers, and voice dictation software, which need information about document content, UI controls and events like changes of focus. Mozilla supports two accessibility APIs: MSAA on Windows[MSAA] and ATK on Linux and Unix[ATK]. Mozilla has also implemented an accessibility model into the browser consisting of transformations from a non-accessiblie DOM tree to an accessible DOM tree. The hypothesis is that a browser based measurement component should simplify some accessibility measurements considerably, like identifying screen flickering, redirection, JavaScript links or spawned windows, by subscribing to events via the accessibility interface or the DOM tree. The measurement component must conform to the web services based plug-in interface for the EIAO Web Accessibility Observatory, so that the measurements can be used for large scale accessibility assessments by the Observatory. Suggested reading is Rapid Application Development with Mozilla by Bruce Perens[MOZ-Book].
Presentation: First presentations
Project report: Browser based Web accessibility measurement component
Project web site: Browser based Web accessibility measurement component web site
Group 9: Distributed HarvestMan
Group Members: Anand B. Pillai, Hadzic Dinko and Arild Andås
The main objective of the project will be to develop a distributed web crawler based on already existing open source web crawler named Harvestman. Harvestman is a console application written in Python. It is built in a modular and configurable way, which makes it easy to extend.The new distributed web crawler will contain several improvements over today's version; increased crawler effiency and distributed operation of crawler instances.
Student projects 2004
Group 1 - Web Structure Mining
Group Members: Thomas Andersen, Quang Van Nguyen and Trond Undrum
Short project description: The goal of this project is to construct a crawler that identifies the interpage structure of a web site. The interpage structure can be modeled e.g. as a graph. From such a graph, different quantitative measurements are to be made: e.g., link density, number of cycles in the graph, average length of paths in the graph (after cycles have been removed), and so on. Finally, it should be determined whether the navigability of a web site can be ranked meaningfully based on such measurements.
Project Description
Project Report , Appendix 1 , Appendix 2 , Appendix 3, Appendix 4, Appendix 5, Appendix 6
Project page
Group 2 - Web Usage Mining
Group Members: Jørgen Andersen, Trude Buøy, Asle Morten Schrøder Ollestad
Short project description: The goal of this project is to construct a parser that segments a web log from a web site into sessions, and identifies the sequences web pages have been accessed within each session. Based on such sequences, traversal patterns of the web site are to be identified. Finally, it should be determined whether the navigability of a web site can be ranked meaningfully based on traversal patterns.
Project Description, Appendix Timetable
Project Report
Implementation
Group 3 - Web Content Mining
Group Members: Arild Finne and Erik Træal
Short project description: The goal of this project is to create a crawler/classifier that downloads the images in a web page and tries to classify the content of each image into different categories, e.g., mathematical formula, logo, buttons, and so on. The focus should be on automatic detection of image usage that reduces the accessibility of a web page.
Project Description
Project Report
Group 4 - Sampling frequence tuning tool
Group Members: Carl T. Vatne, Lars R. Haugen and Per Ø. Hodøl
Short project description: The goal of this project is to find ways to optimise the crawler frequency for individual web sites. The idea is to avoid crawling a site in case the accessibility to the site has not been changed. This is a challenge for all search engines to focus the resources on actual changes. Another relevant aspect of sampling is to select a significant and representative set of sites.
Project Description
Project Report
Implementation
Group 5 - User Behaviour Logging for Datawarehouse Tuning
Group Members: Eirik Aanonsen, Morten Kråkvik and Lars Slåtsveen Breistrand
Short project description: The goal is to design a set of tools to log user behaviour for tuning the datawarehouse. The approach can be based on storing queries with timestamps and store clickstreams of the site navigation. Analysis of both sources may yield valuable information for improving the datawarehouse performance and the web interface.
Project Description
Project Report
Implementation
Student projects 2003
Group 1- Authoring tool identification
Group Members: Svein Arild Myrer, Morten Goodwin Olsen and Tor Oskar Wilhelmsen
Short project description: This project was a part of an ongoing research project at Agder college university named ROBACC which goal is to develop an automated internet spider that assess the accessibility of web pages.
Our involvement was to develop a prototype of a classifier-module that could identify the authoring tool used to create any given webpage based on the structure of the html-code.
Project Report
Project Page
Group 2 - Document classification
Group Members: Erik Kristoffersen, Erling Kristiansen and Marius Andre Sæthre
The goal of the project was to make a program to decide language and topic for texts based on the text examples. The program was developed using python and mysql, the algorithm used was naive bayes.
Project Report
Project Page

