Monday, November 12, 2012

loop variable is leaked?

Python has scope rules, but there are a few minor exceptions. Example: the loop variable leaks.

I was learning about MongoDB tonight and the instructor went over some basic Python code. I would normally skip them but as a learn I was skimming through each Python lecture to make sure nothing strikes me (I don't want to miss any COOL tricks). Well, Python: for loops with lists caught my attention.

The loop variable item was accessible and used post-loop. HOW! I thought it was local. 
My first thought was "damn could this be a static thing? It doesn't look right!". I fired up a shell, typed in some code, and tried the same code and the instructor wasn't lying! 

I dissembled the code in the shell, and this is what I get:

>>> def f():
...     l = ['1', '2', '3']
...     l2 = []
...     for item in l:
...             pass
...     l2.append(item)
...     print l2
... 
>>> f()
['3']
>>> import dis
>>> dis.dis(f)
  2           0 LOAD_CONST               1 ('1')
              3 LOAD_CONST               2 ('2')
              6 LOAD_CONST               3 ('3')
              9 BUILD_LIST               3
             12 STORE_FAST               0 (l)

  3          15 BUILD_LIST               0
             18 STORE_FAST               1 (l2)

  4          21 SETUP_LOOP              14 (to 38)
             24 LOAD_FAST                0 (l)
             27 GET_ITER            
        >>   28 FOR_ITER                 6 (to 37)
             31 STORE_FAST               2 (item)

  5          34 JUMP_ABSOLUTE           28
        >>   37 POP_BLOCK           

  6     >>   38 LOAD_FAST                1 (l2)
             41 LOAD_ATTR                0 (append)
             44 LOAD_FAST                2 (item)
             47 CALL_FUNCTION            1
             50 POP_TOP             

  7          51 LOAD_FAST                1 (l2)
             54 PRINT_ITEM          
             55 PRINT_NEWLINE       
             56 LOAD_CONST               0 (None)
             59 RETURN_VALUE        

This doesn't really help me. I don't remember all the goodies about dis from Python conference last summer. So instead, I googled this first.
http://stackoverflow.com/questions/3611760/scoping-in-python-for-loops

I wasn't the first one. The accepted answer referenced a mailing-list discussion on mails.python.org.
The last value of the loop variable (in this case, item) will live in the surrounding scope.

l = ['1', '2', '3']
l2 = []

for item in l:
    print id(item)
    pass
l2.append(item)
print id(item)

The result is as follow:

yeukhon@yeukhon-P5E-VM-DO:~/tmp$ python wtf.py
3077889320
3077837144
3077839616
3077839616
The first three are the three iterations of the list l and we can see the last output matches the last iteration. This means the value persists. The value continues to reside.

I didn't stop here. I went on. I wanted to find out the underlying assembler code. So I use Cython to translate my python code into C code.

I removed the print statements from the previous code, and then I generated this C code. The translator did an amazing job. Sadly, for some strange reason I couldn't compile it because Python.h was not found. I probably have to setup the library so the compiler can discover that file? I already have python-dev (and python2.7-dev) installed. Probably....

In any case, a few things:
I know that python keeps a namespace, and they are keys/values pair. You can see the static char starting from line 480.

/* Implementation of 'wtf' */
static char __pyx_k__1[] = "1";
static char __pyx_k__2[] = "2";
static char __pyx_k__3[] = "3";
static char __pyx_k__l[] = "l";
static char __pyx_k__l2[] = "l2";
static char __pyx_k__item[] = "item";
static char __pyx_k____main__[] = "__main__";
static char __pyx_k____test__[] = "__test__";

Now I want to dive deeper and see how they keep the last value in the surrounding scope.

Starting from line 655 we enter the for loop. The code is a bit messy to read at first. Many of the Py_* are actually CPython's C API. I have to google some of them to understand what they do and what their return values are. My quest is to find the last reference to the object holding the value item is not garbage collected.

I explained the relevant portion of the code here. It's very hard to read, so make sure you are prepared to read it...

But to summarize, at the end of the loop,

if (PyObject_SetAttr(__pyx_m, __pyx_n_s__item, __pyx_t_1) < 0) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 4; __pyx_clineno = __LINE__; goto __pyx_L1_error;}

This line saves the value of the last iteration into the static variable item in memory! After this, the append statement looks like this.

  __pyx_t_2 = __Pyx_GetName(__pyx_m, __pyx_n_s__l2); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_1 = __Pyx_GetName(__pyx_m, __pyx_n_s__item); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 6; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
  __Pyx_GOTREF(__pyx_t_1);

As you see,  we get the value directly from __pyx_n_s__item. It's a static variable.

Lesson learned:
1. First thing I recognized was all the ugly branch statements in the code. It might be just the translation (btw, Cython did a great job though... I can't even dare to think how I would go about writing such translator.... geesh). But still, under the hook, there are lots of checks. Branch statements are very expensive. So if this was not optimized, we would waste lots of cycles!

2. Static is everywhere in the translated C code! Look at the object variable item. It's static.
static PyObject *__pyx_n_s__item;

3. It's a great challenge to read all these codes. I remember all these SET, INCREF stuff from my programming language paradigm class which I took last year. It's cool to see these guys again. Honestly, if I weren't doing this, I wouldn't be digging through the C API on python.org

Some references:
http://docs.python.org/2/c-api/refcounting.html
http://stackoverflow.com/questions/4657764/py-incref-decref-when
http://docs.python.org/2/c-api/structures.html
http://docs.python.org/2/c-api/list.html
http://wiki.cython.org/enhancements/refnanny
http://wiki.cython.org/enhancements/pypy
http://mail.python.org/pipermail/python-ideas/2008-October/002109.html
http://stackoverflow.com/questions/2525518/writing-code-translator-from-python-to-chttp://docs.python.org/2/library/dis.html


Sunday, November 11, 2012

static in Python - 1

My friend Dave (also my ex-TA) always points me to interesting posts on StackOverflow after I introduced it to him. He likes helping others to succeed whenever possible. I admire him a lot. He's a great mentor.

He was my instructor for my Computer Organization lab a year ago. Last night he linked me to What and where are the stack and heap?. That was his response. In our class, Dave went over scope and lifetime of static and automatic variables. I still remember most of the details. But he got very excited when I said Python programmers don't really need to worry about static allocation because for such a high-level language Python programmers just pass things around.

He wrote an example to illustrate how static variable(s) was used in Python. Later tonight he went over how he used this in his Database homework. He was writing a class for functional dependency and multi-value dependency. He subclass FD to make MVD. He wanted to show A -> B and A ->-> B. But instead of hardcoding the string -> and ->-> he thought of static variable.

Idea:

class FD:
	_ARROW_STRING = "->"

class MVD(FD):
        _ARROW_STRING = "->->"  # or even just * 2 I think that's fine too

I think that's the idea. When using static variable, all instances of FD and MVD will refer to the same _ARROW_STRING which saves memory. Imagine we have 10000 instances of FD. That's a lot of saving.

Sometimes, I as a Python programmer, overlook the old school C++ stuff because they look boring and confusing. But sometimes, a small trick like that will help. Global variables are okay too, but if we want to encapsulate it into the class, we might as well just make it as a static member variable. I mean, global variables are also statically allocated. It doesn't really make any difference in general.

In the next post I will mention more about statics in Python.... sometimes bugs me right now.

Sunday, October 14, 2012

Procedural or Class-based OOP for a simple python library?

Python is an OOP language. Everything in Python is an object. Functions are first-class objects which mean you can pass them around like objects. Try that in languages like C or C++. I am pretty sure you can do it (just google that on StackOverflow), but I am also pretty sure it would be very very very difficult to achieve. I actually asked a similar question at one point...

As a library designer, I am always seeking the best design patterns. I want my code to be clean, small and general. I don't want to make too much assumptions. I want it to be maintainable and easy to use.

One of the libraries we planned to release in the future as part of the Graphyte Project is called RepoMan. This libraries uses the Mercurial commands to perform init, clone, push, pull, add, addremove, and commit. My friend Jeremy did the core implementation and I did the external functionality such as write to a file, read a file, or rename a directory. 90% of what we did in this library were plain I/O operations using the os and shutil libraries. We didn't have a single class in the original design. Not at all.

Tonight I started implementing some hooks for SCM-Manager to talk to its REST apis. I used requests to communicate over HTTP. As usual, the first thing I did was did some initial coding and wrote a few lines of tests.

My initial codes were class-based. I thought about writing the entire thing in pure functions, but then I remember that our webclient will probably benefit from having an object instantiated and passed around, instead of having to reconstruct or saving those kwargs (stuff we need for the repo communication, such as username, password, repo url, etc) all the time.

Clearly, in that case, class-objects helps a lot. With functions, you really can't.You usually don't have a way to store the state. In fact, with functions, states should be one-time, and shouldn't be persistent.... closure is not a state-storage counterpart. Closure is just a form of lazy evaluation to me. Once you complete the function, the state is lost.

I thought about function vs class-object. I finally went with functions in my initial implementation. I realized that for my own libraries, the lowest level of the libraries (or the core) should be simple and straightforward. I could write these independent, individual functions and then implement the interface as I wish. Users can easily pull one of these functions and make use of it if they don't need a full-featured interface.

If I have all the REST apis written as functions, then I can make a very simple Python class-based interface that calls these apis individually. That interface is what people use. It should be simple to use. If I started writing all the cores into the methods, I am revealing the implementation. I am not abstracting it from the end-user. In the end, I ended up writing ugly codes in my methods.

I may be handling some logics after calling some functions in my methods, but that piece of logic code is specified to that method. If I don't like what I did for the core, I can easily killed the core and replaced that old function with a new implementation. My interface will usualy survice.

The conclusion is that when writing a library, stay focus on what the library intends to do. If the library is some complex data structure, the lowest level could be a data structure, which in C or C++ could be a class or struct. There is nothing wrong with that. But even in C or C++, the implementation and definition are always separated in best practices. Thus, the lowest-level of the implementation should be independent from the interface. Anyone who wish to use my libary can easily import a particular function (e.g. a user may be interested in reading the content of a file using the scm-hook). That import will give the user the full control of that piece of functionality.

Stop writing everything as classes in Python. Most of the stuff in Python should remain as functions. Classes usually make codes harder to debug and maintain.

Saturday, October 13, 2012

presentation is not so easy to make

I started leading a beginner Python workshop this past Tuesday and Thursday at school. It didn't go too well from what I can tell. I lost like half of the students on Thursday. I hope I can see them again on next Thursday.

I learned my lesson.

1. don't pay too much attention on pretty slides
They become useless if they don't help. I was attracted to Google HTML5 Slides  since I attended the 2011 I/O recap here in New York. It was really cool to see that. I thought why not make one for my workshop?
I spent so many hours on it and I thought I could write the scripts either in reStructureText or Markdown, but it turned out most of these libraries out there are sort of "broken". Their CSS don't work very nice. I ended up using Shower and did the entire thing by hands. Yeah. Hard-code all the htmls myself.
Next time, I will stick to traditional powerpoint.

2. limit the goals
At first, I wanted this talk to be as comprehensive as possible. Although I had a couple juniors and seniors from CS came to learn Python, but most of my students don't have programming experiences at all.
I modified my slides a few hours before the talk. I realized the goal must be lowered to introduction to programming. Enough to get by. I still ended up too much.

3. test your equipment
Don't rely on previous experiences. They break and they change over time. For example, the first workshop (which was actually my talk on Python and introduction to programming), I spent nearly 30 minutes trying to hook up the projector and fixing my slides. Freakin' mad....
On Thursday I had to deal with this thing called PyLab and I didn't know how to run a script (I can drag the .py file into PyLab to let it execute but it sucks..). I planned to write little script on IDLE and I couldn't.

4. get a better room
Seriously... 7/107 sucks. That projector requires me to turn off the light which is a terrible thing. I hate that. Students have to keep turning their heads over me.... that's bad for teaching.
None of the CS labs are setup such that the computers can face the projector. That's why I asked for the engineering room at first because it has a good setup...

Anyhow, I posted my stuff here
http://yeukhon.bitbucket.org/

R.I.P Amanda Todd



This is supposed to be a programming blog. But I am going to take a little detour. I was supposed to be refactoring Aurum right now. 

I just spoke to my girlfriend and she told me about this suicidal video. I just want to be very honest to the audience out there: my first thought was to check her out. I want to know why she killed herself.

It was morally wrong for me to check her out in the first place... my bad.

But I want to say that her connections were totally fucked up. First of all, that motherfucker player cheated on his girlfriend, and that dumbass bitch girlfriend thought it was worth the trouble to teach Amanda a little lesson. Both of them are stupid. A good girl knows a cheater is worthless. 

Secondly, all the observers in her life were totally fucked up too. Just because her nude photo was posted online, none of you should moved away from her. She just need a few real greeting to make her feel stronger. None of you did. 

If I were there, I am just being serious, I would stop the fucking fight and pull her aside. I would defend the shit out of her just because I think it was wrong to beat anyone up. She already paid the price.. it was a public humiliation. 

I believe her family was divorced. She probably needed a complete family. I don't want to make any more conjecture, but family support was very important. People in her life were generally bad and cruel.

Now you bullies have to live with this death mark forever. You guys will constantly remember Amanda from time to time. When you are at your wedding, you might remember Amanda. When you are making love with a girl, you fucking player, you are going to remember that your penis was once inside this beautiful, yet, dead, girl. She was only looking for a little love from people. Just a little bit, and you ruined it.

God. Please have mercy on Amanda. I know she killed herself, but please make her feel comfortable.

Friday, October 5, 2012

Learning SCRUM with LEGO

So Michael bought some legos to each scrum for this MIS software engineering course. It would be fun to do it in the lab so we scheduled the event today.

Actually I was late today so I missed the first 10 mins. But it was a video showing what scrum was about.

The whole game was divided into several sessions. Michael followed the rule very strictly by timing us. Since there was only five members, we made a big team instead of dividing into sub-teams.

We write down the user stories (the stuff we need to build) on Post-its. Then we put up a big poster paper on the wall. We first grouped them up by the level of difficulty. I actually disagreed with the arrangement we had but majority won.

Anyhow, then we put up another poster and drew three lines, each line is called a sprint (period), so there was three sessions for us to complete. I think each is 15 minutes.

We tried to construct all the vehicle in the first sprint, but we failed. SUV was the only one that actually looked usable, whereas the bus and the tractor were totally useless.


We spread the legos on the floor.

In any case, after we finished the first sprint, we had a short recap session. We were asked to think what went wrong. The first thing was resources weren't given ahead of time and we had to search for pieces which proved to be very difficult. Just look at the picture above. The only thing that was completed was the bridge. The bridge, like I said, would be the easiest....


That's Yuriy.

We continued to perfect our cars and build other stuff which I can't recall now. But  it didn't go very well either. The bus was totally broken when we showed it to the product owner (Michael) in this case.

Our final product.

When it came to the last phase, we still had to finish the bus. I begged to work on the tower crane because it was worth five points and I knew it was a simple task. 
We did most of it, but as you can see in the photo, we didn't manage to put covers over the house or the garage.  

The requirement said the garage must be able to protect the cars from bad weather. Jeremy said "well it can protect horizontal rains." LOL
Recap!
The final phase was reflection. We pointed out that resources weren't given ahead of time, and we should have collected the pieces and put them into groups in the first sprint. We also need to re-weigh each user stories. For example, houses are very easy to make and they should be done as soon as possible. But the dilemma was we need to need how big the cars are ahead of time so the garage and bus stop could be build. 

This nice exercise shows us several things:
1. we can't construct a perfect plan
2. we can't overestimate or underestimate any task
3. we need to be able to work as a team or things will fall apart

It's a nice exercise. We have a lot of fun.
This feels like one of thoses Google internal training event :P

Wednesday, September 26, 2012

Use Fabric more!

I have been really busy these days. I have many things I want to write about. Oh please good nature let me have some free times to write about things!

Anyway. I just happened to be on StackOverflow. There were two questions on sending files remotely. One was copying over to another machine, and another through FTP.

I am against using raw connection. Fabric is designed for deployment, so I am a strong advocate of using it.

How to automate the sending of files over the network using python?
Python Script Uploading files via FTP

I wrote the scripts very quickly.

    #!/usr/bin/python
    from fabric.api import run, env, sudo, put

    env.user = 'username'
    env.host = ['hostname,]
    #env.password = 'my_lovely_password'

    def copy():
        put('wong_8066.zip', '/home1/yeukhon/wong_8066.zip')

    def copy2():
        put('wong_8066.zip', '/www/public/wong_8066.zip')


You can save it as fabfile.py and run it via by calling fab task_name  or multiple task names all together.

If you need to automated it you should specify the path to the ssh key. Hardcoding the password (for FTP, for example) is OK if this is a personal thing.

Seriously... use Fabric more. Sadly, GLASSLAB needs a better automation system so we have to go with Chef.