    I would like you address the comments of Reviewer 2, particularly with
    regard to the description of the adaptation Java harness to deal with
    warmup. I would expect to see a convincing argument that the computation
    has reached a steady state.

We understand referee2 and your concern about the JIT experiments, which is why
we verified our experiments with two experts in JIT development for both Java
and Node.js before submitting the paper. We also read the supplied papers, but
most of the information is not applicable to our work for the following
reasons.

1. SPEC benchmarks are medium to large. In contrast, our benchmarks are 5-15
   lines in length for each programming language (see code for the Cforall
   tests in the paper). Hence, there is no significant computations, complex
   control flow, or use of memory.  They test one specific language features
   (context switch, mutex call, etc.) in isolation over and over again. These
   language features are fixed (e.g., acquiring and releasing a lock is a fixed
   cost). Therefore, unless the feature can be removed there is nothing to
   optimize at runtime. But these features cannot be removed without changing
   the meaning of the benchmark. If the feature is removed, the timing result
   would be 0. In fact, it was difficult to prevent the JIT from completely
   eliding some benchmarks because there are no side-effects.
   
2. All of our benchmark results correlate across programming languages with and
   without JIT, indicating the JIT has completed any runtime optimizations
   (added this sentence to Section 8.1). Any large differences are explained by
   how a language implements a feature not by how the compiler/JIT precesses
   that feature.  Section 8.1 discusses these points in detail.

3. We also added a sentence about running all JIT-base programming language
   experiments for 30 minutes and there was no statistical difference,
   med/avg/std correlated with the short-run experiments, which seems a
   convincing argument that the benchmark has reached a steady state. If the
   JIT takes longer than 30 minutes to achieve its optimization goals, it is
   unlikely to be useful.

4. The purpose of the performance section is not to draw conclusions about
   improvements. It is to contrast program-language implementation approaches.
   Section 8.1 talks about ramifications of certain design and implementation
   decisions with respect to overall performance. The only conclusion we draw
   about performance is:

     Performance comparisons with other concurrent systems and languages show
     the Cforall approach is competitive across all basic operations, which
     translates directly into good performance in well-written applications
     with advanced control-flow.


   I would also like you to provide the values for N for each benchmark run.

Done.


Referee 2 suggested

   * don't start sentences with "However"

However, there are numerous grammar sites on the web indicating "however" (a
conjunction) at the start of a sentence is acceptable, e.g.:

 https://www.merriam-webster.com/words-at-play/can-you-start-a-sentence-with-however
 This is a stylistic choice, more than anything else, as we have a
 considerable body of evidence of writers using however to begin sentences,
 frequently with the meaning of "nevertheless."
