Difference between revisions of "Files Dashboard"

From ecology
Jump to: navigation, search
(User manual)
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day in the date interval and a column for every tracker that belongs to the project.
+
The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day within the date interval and a column for every tracker that belongs to the project.
  
 
Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.
 
Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.
Line 7: Line 7:
 
* How big the file is
 
* How big the file is
 
* Whether errors happened during parsing of its contents
 
* Whether errors happened during parsing of its contents
 +
 +
== User manual ==
 +
If you can access the 'services' machine, you should be able to use your login/password and access the url:
 +
https://services.e-ecology.sara.nl/dashboard/
 +
 +
=== Interaction ===
 +
To customise the reported dashboard, the web page shows a bar at the top, with a couple of shortcuts:
 +
* for months (namely, the current month as the last in the list, and then 11 months in advance), which set the first and the last days to be included in the report
 +
* for projects, there is a dropdown list of all the project names; the request to change the project is sent by clicking on the submit button to the right of the list
 +
 +
=== URL Parameters ===
 +
You have more options if you look at the url of the page directly. The url takes 3 parameters: projectId, startDay and endDay. In principle, you can use the web interface to select the values that you want (click on a month, for instance, to set the dates), or select from a drop-down list the project you want to view the report for, as explained above.
 +
 +
However, if the interface does not provide the month you want to see (e.g.: you want to see something else than a specific month), then you can feed the parameters by hand in your browser: <code>...?projectId=2&startDay=20120401&endDay=20120430</code>.
 +
 +
If a projectId fails to appear on the url or the id provided does not make sense, then no project will be selected and therefore a message stating that no data has been found will be displayed.
 +
 +
If the dates fail to make sense, then the current date will be selected by default.
 +
 +
=== A bit more on the URL parameters ===
 +
; projectId
 +
: is the Identifier of the Project that you want to query (a number that is given to the project when it is created)
 +
; startDay
 +
: is the first day you want to include in the report, in format yyyyMMdd, where
 +
:* yyyy is the 4-digit year (e.g. 2012)
 +
:* MM is the 2-digit month within the year (e.g. 05 for May)
 +
:* dd is the 2-digit day of the month (e.g. 01 for the first day of the month)
 +
; endDay
 +
: is the last day you want included in the report, with the same format as the startDay
 +
 +
'''Note:''' The parameters are separated from the resource url by the standard HTTP symbol (a question mark, so '?'), and they are separated from each other by the standard HTTP parameter separator (the ampersand, so '&').
 +
 +
This has a number of advantages. Among others, you can make a bookmark of a specific report.
  
 
== Files processing ==
 
== Files processing ==
 
 
Raw tracking data comes into the system sent as plain text files through a [https://www.dropbox.com/ Dropbox] account. Dropbox makes the files available in the server file system.
 
Raw tracking data comes into the system sent as plain text files through a [https://www.dropbox.com/ Dropbox] account. Dropbox makes the files available in the server file system.
  
Line 27: Line 59:
 
; Size
 
; Size
 
: how big the file contents are
 
: how big the file contents are
 
=== File processing lifecycle ===
 
The goal being to show the three pieces of information described at the beginning, and considering the information that is available in the file name and properties, the system needs to keep track of these. Here is what it is looked at and how.
 
 
The file name (alone, without the path) uniquely identifies it in the system. If this is a new file, a new entry will be added to the system. When a file is discovered in the file system, first its properties are analyzed, as described above, and stored next to the file name.
 
 
If there already existed an entry for the file name, then the ''last modification date'' must be checked. If the one from the file differs from that in the entry in the system, then the information found in the newly found file must be taken into consideration; so the new ''last modification date'' overrides the one in the system entry (which, in turn, gets discarded) and along with the new ''file size''.
 
 
During all this process, no peek is taken inside the the file's contents.
 
 
'''Note:''' At this stage in the development there's no tracking done when parsing the files contents. Therefore, no errors are reported in the dashboard yet.
 
 
== Functional architecture ==
 
The architectural components involved in making the webpage available are the following:
 
; A relational database
 
: one specific table is used to store the files information: gps.uva_trackingfile_parsing
 
: the relations between projects and trackers exist regardless of this file parsing system, but they are also taken into account to find out which trackers to pay attention to when a project is selected
 
; A web front-end
 
: the system offers the information to the user as a web page. This information is taken from the database
 
; A set of daemons
 
: they gather information on their own and feed the database
 
 
Therefore, the whole system relies on the database to exchange information. Only what is present there will be shown.
 
 
=== Daemons ===
 
Two sets of [http://en.wikipedia.org/wiki/Daemon_(computing) daemon]s exist in the system related to the files:
 
* One file detection daemon
 
* Several file parsing daemons
 
 
For the moment they are kept separate as the parsing of the files is no particularly easy chore (lots of details need to be taken into account). It has been agreed that the file information be checked once every two hours. Therefore, each of the daemons, independently, is launched every 2h to fulfill their tasks.
 
 
==== File detection ====
 
The files need to be found. Therefore, there needs to be something that scouts the incoming files directories, detects changes and informs the system. In computing, the concept of ''daemon'' comes in handy for this task, as no user interaction is required here.
 
 
The only purpose of existence for this program (the daemon) is to run every once in a while:
 
# Traverse the incoming files directory and look for:
 
#* New files
 
#* Changes in existing files
 
# Report the detected changes to the system
 
# Sleep for a while and then start again
 
 
Only, to make the system easier, rather than an a typical operating system daemon, it has been developed as a [http://en.wikipedia.org/wiki/Web_application web application] without any web interface, but this is just a minor implementation detail.
 
 
If errors are encountered when trying to find out the files' properties, they are output to log files.
 
 
==== File parsing ====
 
Parsing actually deals with extracting the information out of the raw data inside the files and transforming it to useful information in the system. This is no easy task because the raw data format is complex, lots of information needs to be extracted... Because of the way the system gathers the raw data, there are also check-ups that need to be verified to assure data consistency... Also, different tracker versions provide data in different formats.
 
 
All in all, many things can go wrong during parsing and many different small details need to fall in place to guarantee that the right information is extracted. Therefore, nothing has been touched for the moment in the current parsers.
 
 
They are implemented as perl scripts that the operating system cron launches.
 
 
Some conversations are being held already about improving parsing. When the modifications actually take place, it will be a good moment to include reporting in them. Then, this reporting can be fed into the database and, in turn, be displayed on the dashboard.
 
 
== Software architecture ==
 
The part of UvA-BiTS related to the file dashboard is developed in Java, divided in different Maven modules.
 
* One module is in charge of the persistence model. It is built upon actual and de-facto standard frameworks such as JPA, Spring and Hibernate.
 
* The daemon
 
* The web layer
 
 
=== Model ===
 
 
=== Daemon ===
 
 
The beauty of having it deployed as a web application is that the [http://en.wikipedia.org/wiki/Web_container web application container] provides a number of facilities. The main one is that it provides a centralized management interface already developed, that can be very easily used to list, launch, stop, replace and otherwise operate with the deployed applications. So, in one place, one can see everything that is related to the system. Then, other beauties include centralized logging that is automatically rolled up, a confined running space, security features (in case they are needed), thread management, connectors, etc.
 
 
=== Web layer ===
 

Latest revision as of 13:56, 24 July 2014

The files dashboard is a web page to get an overview of the raw data files of trackers. For a chosen project and a date interval, it displays a table with a row for every day within the date interval and a column for every tracker that belongs to the project.

Each cell from the table will then contain information for the files of that tracker for that day; in principle, one box for each file. And usually, no more than one file per day. The purpose is to have information at a glance, so if there are no files for a tracker for one day, an empty box is to be shown.

Three pieces of information are to be shown for each file:

  • Whether a file exists for a given day and tracker
  • How big the file is
  • Whether errors happened during parsing of its contents

User manual

If you can access the 'services' machine, you should be able to use your login/password and access the url: https://services.e-ecology.sara.nl/dashboard/

Interaction

To customise the reported dashboard, the web page shows a bar at the top, with a couple of shortcuts:

  • for months (namely, the current month as the last in the list, and then 11 months in advance), which set the first and the last days to be included in the report
  • for projects, there is a dropdown list of all the project names; the request to change the project is sent by clicking on the submit button to the right of the list

URL Parameters

You have more options if you look at the url of the page directly. The url takes 3 parameters: projectId, startDay and endDay. In principle, you can use the web interface to select the values that you want (click on a month, for instance, to set the dates), or select from a drop-down list the project you want to view the report for, as explained above.

However, if the interface does not provide the month you want to see (e.g.: you want to see something else than a specific month), then you can feed the parameters by hand in your browser: ...?projectId=2&startDay=20120401&endDay=20120430.

If a projectId fails to appear on the url or the id provided does not make sense, then no project will be selected and therefore a message stating that no data has been found will be displayed.

If the dates fail to make sense, then the current date will be selected by default.

A bit more on the URL parameters

projectId
is the Identifier of the Project that you want to query (a number that is given to the project when it is created)
startDay
is the first day you want to include in the report, in format yyyyMMdd, where
  • yyyy is the 4-digit year (e.g. 2012)
  • MM is the 2-digit month within the year (e.g. 05 for May)
  • dd is the 2-digit day of the month (e.g. 01 for the first day of the month)
endDay
is the last day you want included in the report, with the same format as the startDay

Note: The parameters are separated from the resource url by the standard HTTP symbol (a question mark, so '?'), and they are separated from each other by the standard HTTP parameter separator (the ampersand, so '&').

This has a number of advantages. Among others, you can make a bookmark of a specific report.

Files processing

Raw tracking data comes into the system sent as plain text files through a Dropbox account. Dropbox makes the files available in the server file system.

File properties

Once accessible through the local file system, they can be queried, read and parsed to extract the tracking information they contain.

The file name is expected to have the form Log_0533_13042012_xx.txt. This provides:

The tracker number
(in the example, 533)
The reported date
(in the example, April 13th 2012)


Apart from the name, the file has other attributes that the file system provides. Namely:

last modification date
tells when something was modified in the file for the last time.
Size
how big the file contents are