TWiki
>
RmiGrid Web
>
MassSe
(revision 10) (raw view)
Edit
Attach
---++ Mass file handling on the Grid (the GSTREAM library) _Note_: The GSTREAM library is installed on the [[http://www.grid.kfki.hu/twiki/bin/view/KFKIAFS/AfsSoftwares#GSTREAM_The_GSTREAM_library][KFKI AFS]]. Look at that page for instructions on activation. ---+++ The GSTREAM library for read/write C++ streams to Storage Elements As one does generally not want to always stage out the data files from SE-s onto a local disk by hand, and then process it, it is recommended to have read/write streams. The C++ library [[%ATTACHURL%/gstream.tar.gz][GSTREAM]] implements such a library. ( =gstream=, =igstream=, =ogstream= stream classes, like the usual C++ STL =fstream=, =ifstream=, =ofstream= file input/output stream classes; the letter 'g' standing for 'grid'.) It does nothing else, but treats the file as a normal file, unless its name begins with the string =/grid/=. In this case, it stages out the datafile in question onto a local (or AFS) area, and then treats the local file as a normal file. Sooner or later this solution has to be replaced with a GFAL based C++ library. One commonly faces the problem that the file not only has to be processed, but it also has to be passed through a filter program. Therefore I also wrote pipe streams for grid storage ( =igpstream=, =ogpstream=), which are based on the =ipstream= and =opstream= classes of the library at [[http://pstreams.sourceforge.net][http://pstreams.sourceforge.net]] (note the LGPL license!). Practical examples: <verbatim> #include "gstream.h" int main(int argc, char *argv[]) { // Open the datafile for reading. igstream igfile("/grid/cms/alaszlo/some_datafile.dat"); // Extract data from your datafile with 'igstream::operator>>' or with 'igstream::read(char*, int)'. // Close the datafile. igfile.close(); igfile.clear(); // Open the datafile for writing. ogstream ogfile("/grid/cms/alaszlo/some_datafile.dat"); // Write data to your datafile with 'ogstream::operator<<' or with 'ogstream::write(char*, int)'. // Close the datafile. ogfile.close(); ogfile.clear(); // Open the datafile for reading, through a filter program. igpstream igpfile("/grid/cms/alaszlo/some_datafile.dat.gz", "gunzip --stdout %f"); // Extract data from your datafile with 'igpstream::operator>>' or with 'igpstream::read(char*, int)'. // Close the datafile. igpfile.close(); igpfile.clear(); // Open the datafile for writing, through a filter program. ogpstream ogpfile("/grid/cms/alaszlo/some_datafile.dat.gz", "gzip - > %f"); // Write data to your datafile with 'ogpstream::operator<<' or with 'ogpstream::write(char*, int)'. // Close the datafile. ogpfile.close(); ogpfile.clear(); return 0; } </verbatim> When compileing, pass the option =`gstream-config --cflags`= to the compiler, and when linking, pass the =`gstream-config --libs`= option to the linker. Before usage, one has to export the following environmental variables: <verbatim>export LCG_GFAL_VO=your_vo</verbatim> (for _bash_), or <verbatim>setenv LCG_GFAL_VO your_vo</verbatim> (for _tcsh_), and <verbatim>export DEST=your_favourite_storage_element1,your_storage_element2,...</verbatim> (for _bash_), or <verbatim>setenv DEST your_favourite_storage_element1,your_storage_element2,...</verbatim> (for _tcsh_). Setting the environmental variable =TMPDIR= is optional. This specifies the local (or AFS) directory, where the datafiles are staged out (therefore, it has to have large disk space!). E.g.: <verbatim>export TMPDIR=/tmp</verbatim> (for _bash_), or <verbatim>setenv TMPDIR /tmp</verbatim> (for _tcsh_). If not specified, the current working directory ( =$PWD=) is used, as this is recommended for grid jobs (the working nodes have large disk spaces). ---+++ GSTREAM toolkit extensions for command line mass file manipulations Some helper scripts are also shipped together with the GSTREAM library. The wrapper scripts =lcg-cr.sh=, =lcg-rep.sh=, =lcg-cp.sh=, =lcg-del.sh= provide wrappers around the commands =lcg-cr=, =lcg-rep=, =lcg-cp=, =lcg-del=, such that whole directory trees can also be handled. Wrappers around other =lcg-= commands may be also written based on them (in this case, we would be grateful to be shared with them). These scripts provide text outputs of the planned series of =lfc-= and =lcg-= commands. This text output can be passed to the script =runplan.sh=, which executes the command list. Every command is included in a checker loop: execution is given up after 5 unsuccessful tries. If any interruption occures (e.g. because of failure), the =runplan.sh= script can be started again, and will continue from the command, where it got interrupted. Practical examples: <verbatim> > lcg-cr.sh /home/user_name/source /grid/your_vo/user_name/destination your_favourite_se your_vo > runplan.dat > runplan.sh runplan.dat </verbatim> Here =source= may be large directory tree structure. <verbatim> > lcg-rep.sh /grid/your_vo/user_name/source destination_se your_vo > runplan.dat > runplan.sh runplan.dat </verbatim> Here =source= may be large directory tree structure. =destination_se= is the SE, where you want the replica. <verbatim> > lcg-cp.sh /grid/your_vo/user_name/source /home/user_name/dest your_vo > runplan.dat > runplan.sh runplan.dat </verbatim> Here =source= may be large directory tree structure. <verbatim> > lcg-del.sh /grid/your_vo/user_name/target target_se your_vo > runplan.dat > runplan.sh runplan.dat </verbatim> Here =target= may be large directory tree structure, and =target_se= is the SE, from where you want to remove your copy. Setting it to =all= means removal of all replicas. The wrapper scripts may be improved in the future. We thank to Kálmán Kővári for this toolkit. There are also other useful scripts in the package. Please have a look a them. Also the scripts =gexists.sh=, =gput.sh=, =grep.sh=, =gget.sh=, =gdel.sh= may be helpful to manipulate single files. -- Main.AndrasLaszlo - 10 Jul 2009
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
gz
gstream.tar.gz
r5
r4
r3
r2
r1
manage
18.7 K
2008-11-23 - 17:38
AndrasLaszlo
The GSTREAM library source.
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r13
<
r12
<
r11
<
r10
<
r9
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r10 - 2009-07-10
-
AndrasLaszlo
RmiGrid
Log In
RmiGrid Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
CMS
HEP
Main
RmiGrid
RmiVirgo
Sandbox
TWiki
Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback