Future solutions

Multi tool use


Future solutions
I am working with a large data set, which I use to make certain calculations. Since it is a huge data set, my machine, I am working on, is doing the job excessively long, for this reason I decided to use the future package in order to distribute the work between several machines and speed up the calculations.
So, my problem is that through the future (using putty & ssh) I can connect to those machines (in parallel), but the work itself is doing the main one, without any distribution. Maybe you can advice some solution:
My code:
library(future)
workers <- c("000.000.0.000", "111.111.1.111")
plan(remote, envir = parent.frame(), workers= workers, myip = "222.222.2.22")
start <- proc.time()
cl <- makeClusterPSOCK(
c("000.000.0.000", "111.111.1.111"), user = "...",
rshcmd = c("plink", "-ssh", "-pw", "..."),
rshopts = c("-i", "V:\vbulavina\privatekey.ppk"),
homogeneous = FALSE))
setwd("V:/vbulavina/r/inversion")
a <- source("fun.r")
f <- future({source("pasos.r")})
l <- future({source("pasos2.R")})
time_elapsed_parallel <- proc.time() - start
time_elapsed_parallel
f and l objects are supposed to be done in parallel, but the master machine is doing all the job, so I'm a bit confused if i can do something concerning it.
PS: I tried plan()
with remote, multiprocess, multisession, cluster
and nothing.
plan()
remote, multiprocess, multisession, cluster
PS2: my local machine is Windows and try to connect to Kubuntu and Debian (firewall is off in all of those).
Thnx in advance.
@Axeman so yes,
plan(remote, envir = parent.frame(), workers= workers, myip = "192.168.2.48")
I tried this and got an error with Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : reached elapsed time limit
– zoidberg724
Jul 25 at 7:47
plan(remote, envir = parent.frame(), workers= workers, myip = "192.168.2.48")
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : reached elapsed time limit
@Axeman oh sorry, you're right!
– zoidberg724
Jul 25 at 7:50
@Axeman so the
workers
just the machine's IP I suppose to use for the connection– zoidberg724
Jul 25 at 7:56
workers
1 Answer
1
Author of future here. First, make sure you can setup the PSOCK cluster, i.e. connect to the two workers over SSH and run Rscript on them. This you do as:
library(future)
workers <- c("000.000.0.000", "111.111.1.111")
cl <- makeClusterPSOCK(workers, user = "...",
rshcmd = c("plink", "-ssh", "-pw", "..."),
rshopts = c("-i", "V:/vbulavina/privatekey.ppk"),
homogeneous = FALSE)
print(cl)
### socket cluster with 2 nodes on hosts '000.000.0.000', '111.111.1.111'
(If the above makeClusterPSOCK()
stalls or doesn't work, add argument verbose = TRUE
to get more info - feel free to report back here.)
makeClusterPSOCK()
verbose = TRUE
Next, with the PSOCK cluster set up, tell the future system to parallelize over those two workers:
plan(cluster, workers = cl)
Test that futures are actually resolved remotes, e.g.
f <- future(Sys.info()[["nodename"]])
print(value(f))
### [1] "000.000.0.000"
I leave the remaining part, which also needs adjustments, for now - let's make sure to get the workers up and running first.
Continuing, using source()
in parallel processing complicates things, especially when the parallelization is done on different machines. For instance, calling source("my_file.R")
on another machine requires that the file my_file.R
is available on that machine too. Even if it is, it also complicates things when it comes to the automatic identification of variables that need to be exported to the external machine. A safer approach is to incorporate all the code in the main script. Having said all this, you can try to replace:
source()
source("my_file.R")
my_file.R
f <- future({source("pasos.r")})
l <- future({source("pasos2.R")})
with
futureSource <- function(file, envir = parent.frame(), ...) {
expr <- parse(file)
future(expr, substitute = FALSE, envir = envir, ...)
}
f <- futureSource("pasos.r")
l <- futureSource("pasos2.R")
As long as pasos.r
and pasos2.R
don't call source()
internally, this c/should work.
pasos.r
pasos2.R
source()
BTW, what version of Windows are you on? Because with an up-to-date Windows 10, you have built-in support for SSH and you no longer need to use PuTTY.
UPDATE 2018-07-31: Continue answer regarding using source()
in futures.
source()
Thanks a lot, as you said, I got yesterday the connection between the machines, with the
verbose = TRUE
, and by obtaining private keys from the ubuntu machines and sharing them with the windows one (btw it's windows 7). The error that occurred after, was about changing the plan(multiprocess)
, I tried to change the plan, but it gave me the error that I can't use it with the integer, if I'm not mistaken, bc can't say for sure, as I don't have my compute in front. So that's the thing, there's connection 100%, but but the code inside of the future
not getting executed.– zoidberg724
Jul 27 at 6:50
verbose = TRUE
plan(multiprocess)
future
(1) You should not need to share private SSH keys across machines - only public ones. (2) The
plan(cluster, workers = cl)
example I give above is how you set up the two workers; that's the only plan()
you should need. If you want to run, say, four workers on each of those two machines, use workers <- rep(c("000.000.0.000", "111.111.1.111"), each = 4L)
. (3) Yes, I don't expect your future(source(...))
calls to work, but let's talk about that when you've confirmed that it works for you with the plan()
I suggest.– HenrikB
Jul 27 at 6:58
plan(cluster, workers = cl)
plan()
workers <- rep(c("000.000.0.000", "111.111.1.111"), each = 4L)
future(source(...))
plan()
Okay, thanks. With the public one it didn't work. I tried several times and obtained the connection with the private one.
– zoidberg724
Jul 27 at 7:36
Hmm..kay. Please confirm that
f <- future(Sys.info()[["nodename"]])
and value(f)
work as in my example.– HenrikB
Jul 27 at 9:09
f <- future(Sys.info()[["nodename"]])
value(f)
Sorry for the delay, i had some problems appeared. So,
f <- future(Sys.info()
and value(f)
worked, the result I got was "cluster01"
.– zoidberg724
4 hours ago
f <- future(Sys.info()
value(f)
"cluster01"
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
@Axeman the thing is that plan code is doing nothing to me, bc without it there's the connection, but no distribution between the machines
– zoidberg724
Jul 25 at 7:30