Difference between revisions of "Files Dashboard"

From ecology
Jump to: navigation, search
(File processing lifecycle)
Line 38: Line 38:
  
 
'''Note:''' At this stage in the development there's no tracking done when parsing the files contents. Therefore, no errors are reported in the dashboard yet.
 
'''Note:''' At this stage in the development there's no tracking done when parsing the files contents. Therefore, no errors are reported in the dashboard yet.
 +
 +
== Functional architecture ==
 +
The architectural components involved in making the webpage available are the following:
 +
; A relational database
 +
: one specific table is used to store the files information: gps.uva_trackingfile_parsing
 +
: the relations between projects and trackers exist regardless of this file parsing system, but they are also taken into account to find out which trackers to pay attention to when a project is selected
 +
; A web front-end
 +
: the system offers the information to the user as a web page. This information is taken from the database
 +
; A set of daemons
 +
: they gather information on their own and feed the database
 +
 +
Therefore, the whole system relies on the database to exchange information. Only what is present there will be shown.
 +
 +
=== Daemons ===
 +
Two sets of daemons exist in the system related to the files:
 +
* One file detection daemon
 +
* Several file parsing daemons
 +
 +
==== File detection ====
 +
The files need to be found. Therefore, there needs to be something that scouts the incoming files directories, detects changes and informs the system. In computing, the concept of [http://en.wikipedia.org/wiki/Daemon_(computing) daemon] comes in handy for this task, as no user interaction is required here. Only, to make the system easier, it has been developed as a [http://en.wikipedia.org/wiki/Web_application web application] that simply doesn't provide any web interface, rather than an actual typical daemon, but this is just a minor implementation detail.
 +
 +
The only purpose of existence for this program (the daemon) is to:
 +
# Traverse the incoming files directory and look for:
 +
#* New files
 +
#* Changes in existing files
 +
# Report the detected changes to the system
 +
# Sleep for a while and then start again
 +
 +
The beauty of having it deployed as a web application is that the [http://en.wikipedia.org/wiki/Web_container web application container] provides a number of facilities. The main one is that it provides a centralized management interface already developed, that can be very easily used to list, launch, stop, replace and otherwise operate with the deployed applications. So, in one place, one can see everything that is related to the system. Then, other beauties include centralized logging that is automatically rolled up, a confined running space, security features (in case they are needed), thread management, connectors, etc.
 +
 +
== Software architecture ==
 +
The part o UvA-BiTS related to the file dashboard is developed in Java, divided in different Maven modules.
 +
* One module is in charge of the persistence model. It is built upon actual and de-facto standard frameworks such as JPA, Spring and Hibernate.
 +
* The daemon
 +
* The web layer

Revision as of 10:16, 8 May 2012

The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day in the date interval and a column for every tracker that belongs to the project.

Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.

Three pieces of information are to be shown for each file:

  • Whether a file exists for a given day and tracker
  • How big the file is
  • Whether errors happened during parsing of its contents

Files processing

Raw tracking data comes into the system sent as plain text files through a Dropbox account. Dropbox makes the files available in the server file system.

File properties

Once accessible through the local file system, they can be queried, read and parsed to extract the tracking information they contain.

The file name is expected to have the form Log_0533_13042012_xx.txt. This provides:

The tracker number
(in the example, 533)
The reported date
(in the example, April 13th 2012)


Apart from the name, the file has other attributes that the file system provides. Namely:

last modification date
tells when something was modified in the file for the last time.
Size
how big the file contents are

File processing lifecycle

The goal being to show the three pieces of information described at the beginning, and considering the information that is available in the file name and properties, the system needs to keep track of these. Here is what it is looked at and how.

The file name (alone, without the path) uniquely identifies it in the system. If this is a new file, a new entry will be added to the system. When a file is discovered in the file system, first its properties are analyzed, as described above, and stored next to the file name.

If there already existed an entry for the file name, then the last modification date must be checked. If the one from the file differs from that in the entry in the system, then the information found in the newly found file must be taken into consideration; so the new last modification date overrides the one in the system entry (which, in turn, gets discarded) and along with the new file size.

During all this process, no peek is taken inside the the file's contents.

Note: At this stage in the development there's no tracking done when parsing the files contents. Therefore, no errors are reported in the dashboard yet.

Functional architecture

The architectural components involved in making the webpage available are the following:

A relational database
one specific table is used to store the files information: gps.uva_trackingfile_parsing
the relations between projects and trackers exist regardless of this file parsing system, but they are also taken into account to find out which trackers to pay attention to when a project is selected
A web front-end
the system offers the information to the user as a web page. This information is taken from the database
A set of daemons
they gather information on their own and feed the database

Therefore, the whole system relies on the database to exchange information. Only what is present there will be shown.

Daemons

Two sets of daemons exist in the system related to the files:

  • One file detection daemon
  • Several file parsing daemons

File detection

The files need to be found. Therefore, there needs to be something that scouts the incoming files directories, detects changes and informs the system. In computing, the concept of daemon comes in handy for this task, as no user interaction is required here. Only, to make the system easier, it has been developed as a web application that simply doesn't provide any web interface, rather than an actual typical daemon, but this is just a minor implementation detail.

The only purpose of existence for this program (the daemon) is to:

  1. Traverse the incoming files directory and look for:
    • New files
    • Changes in existing files
  2. Report the detected changes to the system
  3. Sleep for a while and then start again

The beauty of having it deployed as a web application is that the web application container provides a number of facilities. The main one is that it provides a centralized management interface already developed, that can be very easily used to list, launch, stop, replace and otherwise operate with the deployed applications. So, in one place, one can see everything that is related to the system. Then, other beauties include centralized logging that is automatically rolled up, a confined running space, security features (in case they are needed), thread management, connectors, etc.

Software architecture

The part o UvA-BiTS related to the file dashboard is developed in Java, divided in different Maven modules.

  • One module is in charge of the persistence model. It is built upon actual and de-facto standard frameworks such as JPA, Spring and Hibernate.
  • The daemon
  • The web layer