Difference between revisions of "Files Dashboard"

From ecology
Jump to: navigation, search
(Daemons)
(User manual)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day in the date interval and a column for every tracker that belongs to the project.
+
The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day within the date interval and a column for every tracker that belongs to the project.
  
 
Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.
 
Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.
Line 9: Line 9:
  
 
== User manual ==
 
== User manual ==
If you've got access to the 'services' machine, you should be able to use your login/password and access the url:
+
If you can access the 'services' machine, you should be able to use your login/password and access the url:
https://services.flysafe.sara.nl/uvabits/projectadmin/trackersdashboard
+
https://services.e-ecology.sara.nl/dashboard/
  
The url takes 3 parameters, that you can fill in by hand, like:
+
=== Interaction ===
<code>...?projectId=2&startDay=20120401&endDay=20120430</code>
+
To customise the reported dashboard, the web page shows a bar at the top, with a couple of shortcuts:
 +
* for months (namely, the current month as the last in the list, and then 11 months in advance), which set the first and the last days to be included in the report
 +
* for projects, there is a dropdown list of all the project names; the request to change the project is sent by clicking on the submit button to the right of the list
 +
 
 +
=== URL Parameters ===
 +
You have more options if you look at the url of the page directly. The url takes 3 parameters: projectId, startDay and endDay. In principle, you can use the web interface to select the values that you want (click on a month, for instance, to set the dates), or select from a drop-down list the project you want to view the report for, as explained above.
 +
 
 +
However, if the interface does not provide the month you want to see (e.g.: you want to see something else than a specific month), then you can feed the parameters by hand in your browser: <code>...?projectId=2&startDay=20120401&endDay=20120430</code>.
 +
 
 +
If a projectId fails to appear on the url or the id provided does not make sense, then no project will be selected and therefore a message stating that no data has been found will be displayed.
  
=== Parameters ===
+
If the dates fail to make sense, then the current date will be selected by default.
 +
 
 +
=== A bit more on the URL parameters ===
 
; projectId  
 
; projectId  
: is the Identifier of the Project that you want to query (currently we have only from 1 to 17)
+
: is the Identifier of the Project that you want to query (a number that is given to the project when it is created)
 
; startDay
 
; startDay
: is the first day you want to request in format yyyyMMdd, where
+
: is the first day you want to include in the report, in format yyyyMMdd, where
 
:* yyyy is the 4-digit year (e.g. 2012)
 
:* yyyy is the 4-digit year (e.g. 2012)
 
:* MM is the 2-digit month within the year (e.g. 05 for May)
 
:* MM is the 2-digit month within the year (e.g. 05 for May)
Line 29: Line 40:
  
 
This has a number of advantages. Among others, you can make a bookmark of a specific report.
 
This has a number of advantages. Among others, you can make a bookmark of a specific report.
 
If a projectId fails to appear on the url, no project will be selected and therefore a message stating that no data has been found will be shown.
 
 
If the dates fail to make sense, then the current date will be selected by default.
 
 
=== Interaction ===
 
To easily select the parameters, the web page shows a bar at the top, with a couple of shortcuts
 
* for months (namely, the current month as the last in the list, and then 11 months in advance), which set the startDay and endDay parameters appropriately
 
* for projects (there's a dropdown list of all the project names; the request is sent to change the project by clicking on the submit button to the right of the list).
 
  
 
== Files processing ==
 
== Files processing ==
Line 57: Line 59:
 
; Size
 
; Size
 
: how big the file contents are
 
: how big the file contents are
 
=== File processing lifecycle ===
 
The goal being to show the three pieces of information described at the beginning, and considering the information that is available in the file name and properties, the system needs to keep track of these. Here is what it is looked at and how.
 
 
The file name (alone, without the path) uniquely identifies it in the system. If this is a new file, a new entry will be added to the system. When a file is discovered in the file system, first its properties are analyzed, as described above, and stored next to the file name.
 
 
If there already existed an entry for the file name, then the ''last modification date'' must be checked. If the one from the file differs from that in the entry in the system, then the information found in the newly found file must be taken into consideration; so the new ''last modification date'' overrides the one in the system entry (which, in turn, gets discarded) and along with the new ''file size''.
 
 
During all this process, no peek is taken inside the the file's contents.
 
 
'''Note:''' At this stage in the development there's no tracking done when parsing the files contents. Therefore, no errors are reported in the dashboard yet.
 
 
== Functional architecture ==
 
The architectural components involved in making the webpage available are the following:
 
; A relational database
 
: one specific table is used to store the files information: gps.uva_trackingfile_parsing
 
: the relations between projects and trackers exist regardless of this file parsing system, but they are also taken into account to find out which trackers to pay attention to when a project is selected
 
; A web front-end
 
: the system offers the information to the user as a web page. This information is taken from the database
 
; A set of daemons
 
: they gather information on their own and feed the database
 
 
Therefore, the whole system relies on the database to exchange information. Only what is present there will be shown.
 
 
=== Daemons ===
 
Two sets of [http://en.wikipedia.org/wiki/Daemon_(computing) daemon]s exist in the system related to the files:
 
* One file detection daemon
 
* Several file parsing daemons
 
 
For the moment they are kept separate as the parsing of the files is no particularly easy chore (lots of details need to be taken into account). The agreement is that the file information must be checked once every two hours. Therefore, each of the daemons, independently, is launched every 2h to fulfill their tasks. A crontab expression makes sure that the daemon launches at 10min past every even hour. The end user can expect, then, that at 15min past every even hour, the dashboard displays the up-to-date list of files.
 
 
==== File detection ====
 
The files need to be found. Therefore, there needs to be something that scouts the incoming files directories, detects changes and informs the system. In computing, the concept of ''daemon'' comes in handy for this task, as no user interaction is required here.
 
 
The only purpose of existence for this program (the daemon) is to run every once in a while:
 
# Traverse the incoming files directory and look for:
 
#* New files
 
#* Changes in existing files
 
# Report the detected changes to the system
 
# Sleep for a while and then start again
 
 
Only, to make the system easier, rather than an a typical operating system daemon, it has been developed as a [http://en.wikipedia.org/wiki/Web_application web application] without any web interface, but this is just a minor implementation detail.
 
 
If errors are encountered when trying to find out the files' properties, they are output to log files.
 
 
==== File parsing ====
 
Parsing actually deals with extracting the information out of the raw data inside the files and transforming it to useful information in the system. This is no easy task because the raw data format is complex, lots of information needs to be extracted... Because of the way the system gathers the raw data, there are also check-ups that need to be verified to assure data consistency... Also, different tracker versions provide data in different formats.
 
 
All in all, many things can go wrong during parsing and many different small details need to fall in place to guarantee that the right information is extracted. Therefore, nothing has been touched for the moment in the current parsers.
 
 
They are implemented as perl scripts that the operating system cron launches.
 
 
Some conversations are being held already about improving parsing. When the modifications actually take place, it will be a good moment to include reporting in them. Then, this reporting can be fed into the database and, in turn, be displayed on the dashboard.
 
 
== Software architecture ==
 
The part of UvA-BiTS related to the file dashboard is developed in Java, divided in different Maven modules.
 
* The Model module is in charge of the persistence model. It is built upon actual and de-facto standard frameworks such as JPA, Spring and Hibernate.
 
* The Web module provides a user-friendly web interface to the user. It relies on Spring and Tapestry frameworks.
 
* The Daemon module populates the database with basic information about files. It is built with Spring and Tapestry.
 
 
=== Model ===
 
The Model module is the basis to access the database. Both the Web and the Daemon modules rely on the services provided by this module to interact with the database.
 
 
It provides Java abstractions of the persistent entities, along with operations to manipulate them. It follows the DAO design pattern to abstract low-level persistence of entities, and the Service (or Fa&ccedil;ade) design pattern to group common operations and provide a more natural interface to the clients of this module.
 
 
Persistence is handled by Hibernate, when possible hidden through the JPA standard. To weave the object mesh, the Spring framework is used. Spring is also used to handle ACID transactions declaratively.
 
 
=== Web layer ===
 
The Web module is a [http://en.wikipedia.org/wiki/Web_application web application] that acts as View and Controller for user interaction with the system.
 
 
The user presentation is structured mainly in 2 sections (System administration and Project administration), although for the moment only the Project Administration has something, and that is, precisely, the trackers dashboard.
 
 
It is developed with Tapestry, a component-oriented web framework that eases development and avoids having to deal with the low level Java servlets API. It uses Spring to establish the links among object instances.
 
 
The trackers dashboard web page delegates persistence activities to the Model module to read the information to populate the projects list and to search for the required file information.
 
 
=== Daemon ===
 
The Daemon exists to populate the file information into the database. Its own responsibility lies with finding new files or changes in existing files and making this information explicit. It delegates all the persistence tasks to the Model module.
 
 
The Java language provides direct access to the underlying local file system and file properties. The module objects are woven with Spring at run time. It has been deviced as a web application, using Tapestry for this sole purpose, which simply loads the Spring context. This context includes the core loop as a <code>Task</code> along with a <code>TaskScheduler</code> where the <code>Task</code> is scheduled to launch every 2×60×60×1000 = 7200000 milliseconds (2 hours). This is handled by Spring.
 
 
The beauty of having the Daemon deployed as a web application is that the [http://en.wikipedia.org/wiki/Web_container web application container] provides a number of facilities. The main one is that it provides a centralized management interface already developed, that can be very easily used to list, launch, stop, replace and otherwise operate with the deployed applications. So, in one place, one can see everything that is related to the system. Then, other beauties include centralized logging that is automatically rolled up, a confined running space, security features (in case they are needed), thread management, connectors, etc.
 
 
== Deployment ==
 
The system is running on the Services machine <code>services.flysafe.sara.nl</code>.
 
 
The different supporting services are the following:
 
; PostgreSQL
 
: A relational database management server where the system database is stored
 
; Apache
 
: A web server to provide access to visualizations
 
: It provides HTTP authentication and authorization
 
; Tomcat
 
: A web application container that hosts the web applications (the trackers dashboard and the daemon)
 
: It is to be reached through the Apache server (so no direct access)
 
:: It relies on authentication and authorization from Apache
 

Latest revision as of 13:56, 24 July 2014

The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day within the date interval and a column for every tracker that belongs to the project.

Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.

Three pieces of information are to be shown for each file:

  • Whether a file exists for a given day and tracker
  • How big the file is
  • Whether errors happened during parsing of its contents

User manual

If you can access the 'services' machine, you should be able to use your login/password and access the url: https://services.e-ecology.sara.nl/dashboard/

Interaction

To customise the reported dashboard, the web page shows a bar at the top, with a couple of shortcuts:

  • for months (namely, the current month as the last in the list, and then 11 months in advance), which set the first and the last days to be included in the report
  • for projects, there is a dropdown list of all the project names; the request to change the project is sent by clicking on the submit button to the right of the list

URL Parameters

You have more options if you look at the url of the page directly. The url takes 3 parameters: projectId, startDay and endDay. In principle, you can use the web interface to select the values that you want (click on a month, for instance, to set the dates), or select from a drop-down list the project you want to view the report for, as explained above.

However, if the interface does not provide the month you want to see (e.g.: you want to see something else than a specific month), then you can feed the parameters by hand in your browser: ...?projectId=2&startDay=20120401&endDay=20120430.

If a projectId fails to appear on the url or the id provided does not make sense, then no project will be selected and therefore a message stating that no data has been found will be displayed.

If the dates fail to make sense, then the current date will be selected by default.

A bit more on the URL parameters

projectId
is the Identifier of the Project that you want to query (a number that is given to the project when it is created)
startDay
is the first day you want to include in the report, in format yyyyMMdd, where
  • yyyy is the 4-digit year (e.g. 2012)
  • MM is the 2-digit month within the year (e.g. 05 for May)
  • dd is the 2-digit day of the month (e.g. 01 for the first day of the month)
endDay
is the last day you want included in the report, with the same format as the startDay

Note: The parameters are separated from the resource url by the standard HTTP symbol (a question mark, so '?'), and they are separated from each other by the standard HTTP parameter separator (the ampersand, so '&').

This has a number of advantages. Among others, you can make a bookmark of a specific report.

Files processing

Raw tracking data comes into the system sent as plain text files through a Dropbox account. Dropbox makes the files available in the server file system.

File properties

Once accessible through the local file system, they can be queried, read and parsed to extract the tracking information they contain.

The file name is expected to have the form Log_0533_13042012_xx.txt. This provides:

The tracker number
(in the example, 533)
The reported date
(in the example, April 13th 2012)


Apart from the name, the file has other attributes that the file system provides. Namely:

last modification date
tells when something was modified in the file for the last time.
Size
how big the file contents are