Hatena::ブログ(Diary)

Perl Tutorial by Code Examples

2009-11-20

Modern Perl Writing Style

I describe Perl modern writing style. Target is after Perl 5.8. If you search Perl topic on internet, you will see many old style source codes. In books, old style source codes remain. I strongly recommend Perl5 modern writing style if you start learning Perl from now.

strict pragma and warnings pragma (Must)

You must enable strict pragma and warnings pragma.

use strict;
use warnings;

Write the two line "use strict;" and "use warnings;" at the start of script. These pragrams check Perl syntax. If you don't write these pragmas, You will have many troubles in development.

Use lexical variable as file handle(Should)

you should use lexical variable as file handle.

# Lexical variable 
my $fh;
my $file = 'file1';

open $fh, '<', $file
  or die "Cannot open '$file': $!";

while (my $line = <$fh>) {
  ...
}

If you declare lexical variable and pass it to open function, file handle is assigned to the lexical variable.

Lexical variable have a big advantage. it have scope. After scope finish, file handle is automatically closed.

You can also pass it to other function.

Don't use symbol such as FH or *FH. That is old way.

If you write "my" within open function, it become short and more modern style.

open my $fh, '<', $file

three arguments open function (Should)

Use three arguments style open function.

open my $fh, '<', $file

In old documentation, two argument style open function is often described, but you should not use that style.

Two arguments style open function often generate security issues.

Don't use two arguments style open function.

open my $fh, "< $file"; # Should not use two arguments style open function

This is applied to pipe open.

# ○ (three arguments)
open my $pipe, '-|', 'dir';

# × (two arguments)
open my $pipe, 'dir |';

Handle file open error (Should)

You should handle file open error.

open my $fh, '<', $file
  or die "Cannot open $file: $!";

If open function fail, it return undef. you should handle the error by or operator. Error message from OS is assigned to $!. You should contain OS error message in the message for user.

If you want to exit program with error message, you can use die function. Program exit with 255 status code.

This is file open case. Adding, you should handle error when the process communicate with external things, such as file or network.

Use lower case characters and underscore in lexical variable name and subroutine name(Strongly recommended)

Use lower case characters and underscore in lexical variable name and subroutine name.

# Lexical variable name
my $user_name;
my $search_word;
my $max_database_connection;

# Subroutine name
sub parse_data {
  ...
}

sub create_table {
  ...
}

Most modules uploaded on CPAN recently are written by this convention. Following this convention have a big advantage because you can provide uniformed interface to module user.

See also Give good name to variable.

Good practice of subroutine name is "verb + noun" (for example, parse_file). If the meaning is explicit, you can use only "verb" (for example, parse). But if you think you will have troubles, "verb + noun" is good.

Use upper case and underscore in package variable name(Strongly recommended)

Use upper case and underscore in package variable name.

our $OBJECT_COUNT;
our $CLASS_INFO;

Use lexical variable, not package variable (Recommended)

If you aren't module author, you don't need to use package variable. If you use package variable in single script, it is wrong. Please use lexical variable by my.

Write code by recommended coding style

Each person have his favorite coding style, but it is good to know recomended cording style, for example, the style of "Perl best practice" Book or the style code cleanup tool Perltidy provide.

I link to source code of Mojo::URL. Mojolicious have good coding style.

You can learn modern writing style.

Mojo::URL source code

I describe some points.

1. See space position before and after of "if" or "foreach" statement or subroutine
# next of "if" is a space. () don't have spaces, etc.
if ($flg) { 
  ...
}
2.See comments

See comments. Many modules in CPAN don't have comments, but I like the code which have good comments.

3.Don't use tab. use space. width is 4 (or 2)

This way is recommended in the book "Perl best practice".

Even if you use tab, it is not so bad.

Use Encode module to manipulate multiple byte string such as Japanese(Strongly recommended)

Use Encode module to manipulate multiple byte string such as Japanese.

See also Encode module - Manipulate multiple byte string such as Japanese properly

Jcode.pm and Jcode.pl is not recommended. these are old way. If you use the Perl after version 5.8, you should use Encode module. This is recommend and standard way.

Don't use default variable $_(Strongly recommended)

Perl language have default variable $_. Default variable is implicit variable when you don't pass argument to a function. Don't use default variable because default variable decrease source code readability.

In the following case, you can use default variable.

1.One-liner

In one-liner you can use default variable. $_ is implicitly used as print function argument and regular expression.

# One-liner
perl -ne "print if /AAA/"

# This is same as the following code
perl -ne "print $_ if $_ =~ /AAA/"
2. map function, grep function, and after-position for statement

You must use $_ in map function grep function, and after-position for statement

my @greped_array = grep { $_ =~ /AAA/ } @array;
my @mapped_array = map  { $_ * 2 } @array;
print $_ for @array;

Famous CPAN module sometimes use default variable, but I personally don't recommend default variable. I recommend you name variable for program readability.

Declare lexical variable in for/foreach statement(Strongly recommended)

In Perl5 you can declare lexical variable in for/foreach statement.

my @students = ('taro', 'kenji', 'naoya');
for my $student (@students) {
  ...
}

Each element of @students is assigned to $student in order. $student is lexical variable and have scope in for block.

You can omit lexical variable declaration, but I don't recommend it.

# Not recommended
for (@students) {
  ...
}

Receive command line arguments (An example)

I describe the way to receive command line arguments.

# Command line argument count is one
my $file = shift;
# Command line arguments count are more than one
my ($file, $option) = @ARGV;

Receive subroutine arguments (An example)

I describe the way to receive subroutine arguments.

# Argument count is one
sub func {
  my $file = shift;
}
# Arguments count are more than one
sub func {
  my ($file, $option) = @_;
}

Use date processing standard module (An example)

If you use Perl5.10 or later, you can use Time::Piece. Time::Piece is standard module.

Time::Piece module - Process date and time

If you can install module from CPAN, you can use DateTime module to process date and time. DateTime has many features to process date and time although it's a little heavy.

Process date generally. DateTime

If you can't use these modules, you can use localtime function or Time::Local module.

Date processing in Perl

Don't load no needed modules(Recomended)

If you copy and past from other program source code, the source code maybe contain the loading of no needed modules. you should delete no need modules for the people who read your program.

# Take care when you use other program source code
# It maybe contains no needed modules 
use File::Spec;
use File::Basename 'basename';

Write Perl document (An example)

If you create script in a project, it is good to write document. You can embedded document into Perl script. In many CPAN modules, document is usually written at the end of script, but in small script you can write document at the top of script to see it easily.

Perl document is written by POD(Plain old document). POD is the syntax to write Perl document. I describe only simple syntax. "=head1" is heading, you write title. you should put empty one line after this, and write document body. "=cut" represent the end of document.

=head1 SCRIPT NAME

SomeScript.pl

=head1 DESCRIPTION

This script is used to do ....

=head1 USAGE

perl SomeScript.pl file1 file2 ...

=cut

# Source code
use strict;
use warnings;

Avoid many # in comments (Recommended)

I often see the source codes which contains many #. I don't recommend this. A big reason is once you write these comments, people who come later should follow this style. It is very boring. Rather than increasing code quality, old comments tend to remain when the implementation of function is changed.

#################################################################
#  function  name      :  foo                                   #
#  Arguments       :  Arguments1  Arguments2                    #
#  return value      :  foo                                     #
#  create  datatime    : bar                                    #
#  create  person      :  baz                                   #
#  function description  : iiiii                                #
#  update  history    :  that 1                                 #
#            :  that 2                                          #
#            :  that 3                                          #
#################################################################
sub func {
    
}

I recommend the following style.

# Simple description
sub func {
    
}

See also

Mojo::URL source code

String list operator (An example)

String list operator is often used when creating array of strings.

my @strings = qw/aa bb cc/;

This is same as the following code.

my @strings = ('aa', 'bb', 'cc');

Import function explicitly (Strongly recommended)

It is better to import function explicitly. People can understand source code easily because they can know from what module the function is imported.

use File::Basename 'basename';
use File::Copy qw/copy move/;
use File::Path 'mkpath';
use Encode qw/encode decode/;

# Use mkpath function.
mkpath $dir;

If you don't import function explicitly, what happen?

use File::Basename;
use File::Copy;
use File::Path;
use Encode;

# People don't know from what module this function is imported.
mkpath $dir; 

In this case, people maybe read all modules in source code to know the module the function is imported. If it is explicit from you, people don't know it. You should import function explicitly.

Don't use goto statement (Strongly recommended)

Perl has goto statement, but don't use goto statement. The programming using goto is very bad style in current days.

In most case, alternative ways is prepared.

If you escape or skip loop, use "last" and "next" statement. If you want to throw exception, use die function.

Maybe only case you should use goto statement is that you call function recursively and very deeply.

Don't use do - while statement (Recommended)

Perl have do - while statement. but I don't recommend do - while statement,

You can represent same logic by only while statement.

In many cases, the logic using do - while is difficult to understand.

Don't use redo (Recommended)

Perl have redo statement. but I don't recommend redo statement. Even if you don't use redo, you can represent all logic. I had used redo a few time, but redo make logic difficult.

Don't use prototype(Recommended)

Perl have a feature called prototype in subroutine definition. but you should not use prototype.

# Should not use prototype
sub func ($@) {
  ...
}

In Perl you can receive any type and any count arguments. You don't need to specify arguments type or counts. You should defined subroutine without prototype.

sub func {
  ...    
}

Use die function when you inform error. Don't return undef (Recommended)

Many people think Perl don't have the feature for exception handling, but Perl have simple exception handling system.

At first, Let't see old error handling by returning undef value.

This subroutine return undef(in scalar context) if error occur.

# Return undef if error occur
sub func {
  my $arg = shift;
  
  ...
  
  #  Error handling
  if ($error) {
    return; 
  }
  # Error don't occur
  return $val;
}

And check error in calling side.

my $val = func();

if (defined $val) {
  # Success
}
else {
  # Error
}

In modern Perl, you should throw exception by die function if error occur.

# Throw exception by die function if error occur
sub func {
  my $arg = shift;
  
  # ...
  
  # Throw exception by die function
  if ($error) {
    die "Error message";
  }

  # Right value when error don't occur
  return $val;
}

If error occur in function, program is finished with error message.

# Program is finished with error message if error occur
func(); 

If you don't want to finish program, you can catch this exception by eval block. This is same as Java catch block. If error occur, error message is assigned to special variable $@. You can know error by $@.

# Catch exception
eval { func() };

if ($@) {
  # Error
}

Use arrow operator when you call constructor

It is good to use arrow operator when you call constructor.

In Perl Constructor call is same as normal method call.

my $obj = SomeClass->new;

I don't recommend indirect method call.

# Indirect method call. I don't recommend it
my $obj = new SomeClass();


Table of contents

DavidDavid 2013/09/10 05:43 Hi, thanks for a nice overview.
You may want to know, that "under bar" is actually called underscore.

Regards,
David

kimotokimoto 2013/09/10 19:07 Thanks, I replace under bar with underscore.

スパム対策のためのダミーです。もし見えても何も入力しないでください
ゲスト


画像認証

トラックバック - http://d.hatena.ne.jp/perlcodesampleen/20091120/1246679588