muckabout - Developer IT

Account sharing among Ubuntu machines

- by muckabout

I'd like a simple and secure system to have allow users in our network to have their account (e.g., 'myname') work on every machine in the network (e.g., such that they could ssh to any machine and have the same userid, mounted smb share). Any suggestions?

Read the article

Batch select with SQLAlchemy

- by muckabout

I have a large set of values V, some of which are likely to exist in a table T. I would like to insert into the table those which are not yet inserted. So far I have the code: for value in values: s = self.conn.execute(mytable.__table__.select(mytable.value == value)).first() if not s: to_insert.append(value) I feel like this is running slower than it should. I have a few related questions: Is there a way to construct a select statement such that you provide a list (in this case, 'values') to which sqlalchemy responds with records which match that list? Is this code overly expensive in constructing select objects? Is there a way to construct a single select statement, then parameterize at execution time?

Read the article

Disco/MapReduce: Using results of previous iteration as input to new iteration

- by muckabout

Currently am implementing PageRank on Disco. As an iterative algorithm, the results of one iteration are used as input to the next iteration. I have a large file which represents all the links, with each row representing a page and the values in the row representing the pages to which it links. For Disco, I break this file into N chunks, then run MapReduce for one round. As a result, I get a set of (page, rank) tuples. I'd like to feed this rank to the next iteration. However, now my mapper needs two inputs: the graph file, and the pageranks. I would like to "zip" together the graph file and the page ranks, such that each line represents a page, it's rank, and it's out links. Since this graph file is separated into N chunks, I need to split the pagerank vector into N parallel chunks, and zip the regions of the pagerank vectors to the graph chunks This all seems more complicated than necessary, and as a pretty straightforward operation (with the quintessential mapreduce algorithm), it seems I'm missing something about Disco that could really simplify the approach. Any thoughts?

Read the article

Account sharing among Ubuntu machines

- by muckabout

I'd like a simple and secure system to have allow users in our network to have their account (e.g., 'myname') work on every machine in the network (e.g., such that they could ssh to any machine and have the same userid, mounted smb share). Any suggestions?

Read the article

Python line file iteration and strange characters

- by muckabout

I have a huge gzipped text file which I need to read, line by line. I go with the following: for i, line in enumerate(codecs.getreader('utf-8')(gzip.open('file.gz'))): print i, line At some point late in the file, the python output diverges from the file. This is because lines are getting broken due to weird special characters that python thinks are newlines. When I open the file in 'vim', they are correct, but the suspect characters are formatted weirdly. Is there something I can do to fix this? I've tried other codecs including utf-16, latin-1. I've also tried with no codec. I looked at the file using 'od'. Sure enough, there are \n characters where they shouldn't be. But, the "wrong" ones are prepended by a weird character. I think there's some encoding here with some characters being 2-bytes, but the trailing byte being a \n if not viewed properly. If I replace: gzip.open('file.gz') With: os.popen('zcat file.gz') It works fine (and actually, quite faster). But, I'd like to know where I'm going wrong.

Read the article

Processes sharing cores on Ubuntu system

- by muckabout

My coworkers and I share an 8-core server running Ubuntu for our batch processes. I tend to run 4 processes at a time, each of which consumes 100% CPU per core when nothing else is running. When a coworker runs his processes (typically about 4 at a time), his also get 100% per. However, when both of us run ours (he always goes first), his still get 100% and mine seem to divide the remaining processing power and linger in the 10-40% range. I even reniced his process to a lower value and it did not change. What are the issues that may cause this?

Read the article

SQLAlchemy autocommiting?

- by muckabout

I have an issue with SQLAlchemy apparently committing. A rough sketch of my code: trans = self.conn.begin() try: assert not self.conn.execute(my_obj.__table__.select(my_obj.id == id)).first() self.conn.execute(my_obj.__table__.insert().values(id=id)) assert not self.conn.execute(my_obj.__table__.select(my_obj.id == id)).first() except: trans.rollback() raise I don't commit, and the second assert always fails! In other words, it seems the data is getting inserted into the database even though the code is within a transaction! Is this assessment accurate?

Developer IT