Friday, March 31, 2006

Fun with processes (Updated)

I really need to go through all of my Erlang documentation so people like petekaz can stop making me feel small.
I had an issue with some poor Erlang code I was running where it performed a file:open but not the appropriate file:close. Not wanting to restart my application I wanted to know how I could fix this problem without shutting it down. Erlang, of course, provides a solution.
One thing that made this workable in the first place was the fact that I had no other open files besides those that needed to be closed.

So imagine for a second you have opened a number of files and lost the reference to them. For example:

(ort_bot@blong)21> [file:open("/tmp/c2.log", read) || X <- lists:seq(1, 10)].
[{ok,<0.8974.7>},
{ok,<0.8975.7>},
{ok,<0.8976.7>},
{ok,<0.8977.7>},
{ok,<0.8978.7>},
{ok,<0.8979.7>},
{ok,<0.8980.7>},
{ok,<0.8981.7>},
{ok,<0.8982.7>},
{ok,<0.8983.7>}]


file:open causes a process to be created (this is Erlang afterall).
In the shell you can simply type i(). and get output like:


(ort_bot@blong)22> i().
Pid Initial Call Heap Reds Msgs
Registered Current Function Stack
<0.0.0> otp_ring0:start/2 377 6569 0
init init:loop/1 2
<0.2.0> erlang:apply/2 4181 179350 0
erl_prim_loader erl_prim_loader:loop/3 5
<0.4.0> gen_event:init_it/6 1597 1114 0
error_logger gen_event:loop/4 10
<0.5.0> erlang:apply/2 6765 8325 0
application_controlle gen_server:loop/6 7
<0.7.0> application_master:init/4 377 45 0
application_master:main_loop/2 8
<0.8.0> application_master:start_it/4 233 90 0
application_master:loop_it/4 5
<0.9.0> supervisor:kernel/1 610 1373 0
kernel_sup gen_server:loop/6 12
<0.10.0> rpc:init/1 610 17456475 0
rex gen_server:loop/6 12
<0.11.0> global:init/1 233 1246 0
global_name_server gen_server:loop/6 12
<0.12.0> erlang:apply/2 233 275 0
global:loop_the_locker/1 4
<0.13.0> erlang:apply/2 233 4 0
global:collect_deletions/2 6
<0.14.0> inet_db:init/1 233 137 0
inet_db gen_server:loop/6 12
<0.16.0> supervisor:erl_distribution/1 377 307 0
net_sup gen_server:loop/6 12
<0.17.0> erl_epmd:init/1 233 137 0
erl_epmd gen_server:loop/6 12
<0.18.0> auth:init/1 233 77 0
auth gen_server:loop/6 12
<0.19.0> net_kernel:init/1 233 3272 0
net_kernel gen_server:loop/6 12
<0.20.0> inet_tcp_dist:accept_loop/2 233 178 0
prim_inet:accept0/2 10


This list goes on for awhile longer, depending on what you have done. As you scan through this output you should see something like:

<0.8974.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8975.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8976.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8977.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8978.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8979.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8980.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8981.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8982.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3
<0.8983.7> erlang:apply/2 233 135 0
file_io_server:server_loop/1 3


All of those file_io_server's are the open files. And the Pid associated with it is the file handle. So to close all these files all you have to do is call file:close on those pids. You can create a pid give 3 numbers in the shell by using the pid function:


(ort_bot@blong)23> file:close(pid(0, 8974, 7)).
ok
(ort_bot@blong)24> file:close(pid(0, 8975, 7)).
ok
(ort_bot@blong)25> file:close(pid(0, 8976, 7)).
ok
(ort_bot@blong)26> file:close(pid(0, 8977, 7)).
ok
(ort_bot@blong)27> file:close(pid(0, 8978, 7)).
ok
(ort_bot@blong)28> file:close(pid(0, 8979, 7)).
ok
(ort_bot@blong)29> file:close(pid(0, 8980, 7)).
ok
(ort_bot@blong)30> file:close(pid(0, 8981, 7)).
ok
(ort_bot@blong)31> file:close(pid(0, 8982, 7)).
ok
(ort_bot@blong)32> file:close(pid(0, 8983, 7)).
ok


Now, call i() again and you'll notice all of those pesky file's have been closed. You can also do all of this from a remote machine if you have run your erl properly. For example my remote application is run on a machine called 'blong' and the node name is 'ort_bot'. I am on 'ooter' on a machine called 'osx'. So from 'osx' I can do this:


osx:~/projects/Erlang orbitz$ erl -sname ooter -setcookie cookiemonster
Erlang (BEAM) emulator version 5.4.13 [source] [threads:0]

Eshell V5.4.13 (abort with ^G)
(ooter@osx)1>
User switch command
--> r ort_bot@blong
--> j
1 {shell,start,[init]}
2* {ort_bot@blong,shell,start,[]}
--> c 2
Eshell V5.4.12 (abort with ^G)
(ort_bot@blong)1> i().
Pid Initial Call Heap Reds Msgs
Registered Current Function Stack
<0.0.0> otp_ring0:start/2 377 6579 0
init init:loop/1 2
<0.2.0> erlang:apply/2 610 161416 0
erl_prim_loader erl_prim_loader:loop/3 5
<0.4.0> gen_event:init_it/6 610 557 0
error_logger gen_event:loop/4 10
<0.5.0> erlang:apply/2 6765 8289 0
application_controlle gen_server:loop/6 7


Connecting to a remote shell is easy and we now have a shell on that node and can do anything from here we want (infact I've done halt(). quite a few times by accident not thinking oi!).

Alternativly, you can also use pman, which provides an interface much like i() but does more interesting stuff too. For instance you can see linked processes and kill various processe through pman. In order to use pman, on your local node simply run: pman:start().
This provide you with a window that looks like:



This looks just like i(). And from the 'node' menu you can switch nodes to view the processes on. As you can see from this screenshot I am viewing the processes on ort_bot@blong, which is a remote node.

As you can see, Erlang provides a very powerful set of tools to keep your application stable and perform various operations. It is things like this that make me glad to run code in a shell rather than a stand alone appliction. For situations like this, the power of the erlang shell is unparalleled.

Update:
Well in his infinite wisdom, after a bit of playing around petekaz came up with two functions that makes solving this particular problem all of the easier. My goal was mainy to show how to interact with the shell to get useful information about processes, but petekaz has shown me how to actually close a specific file. It seems file:open drops some useful information into ets.


[ file:close(P) || P <- processes(), {ok, "/tmp/test.erl"} == file:pid2name(P) ].


Where "/tmp/test.erl" is the file you are trying to close.
This makes use of two useful functinos. processes for starters evaluates to a list of all current proceses runninging in the node, and file:pid2name takes a pid and tells you what file it is associated with. If the pid is not a file it returns undefined.
So it seems Erlang has a complete solution to this problem.

Thanks petekaz for figuring this one out completely.

3 comments:

  1. A good post, but it makes me think about how best to ensure that a file that I open in Erlang will be closed, something like a try/catch/finally block in Java.

    ReplyDelete
  2. It's called
    try/catch/after in erlang

    RTFM

    ReplyDelete
  3. Thanks for the post, you taught me a lot about the shell.

    Another option for cleaning up the files is to just terminate your process, or from the shell type "1=2." All the linked io processes go away.

    That might be a good idiom to prevent file leaks; do all your file handling work from a temporary subprocess.

    ReplyDelete