alexmarques commited on
Commit
6fccd38
·
verified ·
1 Parent(s): 365a575

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -277,5 +277,48 @@ The model was evaluated on the OpenLLMv1 leaderboard task, using [lm-evaluation-
277
  <td>63.67</td>
278
  <td>100.0</td>
279
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
280
  </tbody>
281
  </table>
 
277
  <td>63.67</td>
278
  <td>100.0</td>
279
  </tr>
280
+ <tr>
281
+ <td rowspan="7"><b>OpenLLM V2</b></td>
282
+ <td>IFEval (Inst Level Strict Acc, 0-shot)</td>
283
+ <td>91.01</td>
284
+ <td>90.29</td>
285
+ <td>99.21</td>
286
+ </tr>
287
+ <tr>
288
+ <td>BBH (Acc-Norm, 3-shot)</td>
289
+ <td>73.72</td>
290
+ <td>73.95</td>
291
+ <td>100.31</td>
292
+ </tr>
293
+ <tr>
294
+ <td>Math-Hard (Exact-Match, 4-shot)</td>
295
+ <td>61.71</td>
296
+ <td>20.69</td>
297
+ <td>33.54</td>
298
+ </tr>
299
+ <tr>
300
+ <td>GPQA (Acc-Norm, 0-shot)</td>
301
+ <td>32.13</td>
302
+ <td>32.89</td>
303
+ <td>102.35</td>
304
+ </tr>
305
+ <tr>
306
+ <td>MUSR (Acc-Norm, 0-shot)</td>
307
+ <td>42.06</td>
308
+ <td>41.80</td>
309
+ <td>99.37</td>
310
+ </tr>
311
+ <tr>
312
+ <td>MMLU-Pro (Acc, 5-shot)</td>
313
+ <td>65.82</td>
314
+ <td>65.65</td>
315
+ <td>99.73</td>
316
+ </tr>
317
+ <tr>
318
+ <td><b>Average Score</b></td>
319
+ <td><b>61.07</b></td>
320
+ <td><b>54.21</b></td>
321
+ <td><b>88.77</b></td>
322
+ </tr>
323
  </tbody>
324
  </table>