EDM4hep.jl EDM I/O Tutorial

This tutorial shows how to read EDM4hep data files and perform some simple analysis

Note that

You can also download this tutorial as a Jupyter notebook and a plain Julia source file.

Table of contents

Loading the necessary modules

  • We will use the EDM4hep module that defines all the EDM4hep types and its relations and links.
  • We will use the FHist module to create histograms
  • We will use the Plots module to plot the histograms

If these modules are not installed, you can install them by running the following commands:

using Pkg
Pkg.add("EDM4hep")
Pkg.add("FHist")
Pkg.add("Plots")
using EDM4hep
using EDM4hep.RootIO
using Base.Iterators: partition, take
using FHist
using Plots: plot, scatter, plot!, theme

Reading an EDM4hep file

We will read an EDM4hep file with the RootIO.Reader function. This function returns a reader object that can be used to access the events in the file. The input file is a ROOT file with the EDM4hep data model and is located to the path ttbar_edm4hep_digi.root.

finput = joinpath(@__DIR__,"../../examples" ,"ttbar_edm4hep_digi.root")
"/home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../examples/ttbar_edm4hep_digi.root"

We create a reader object to access the events in the file. The object displays a summary table of the content of the file.

reader = RootIO.Reader(finput)
┌────────────────┬──────────────────────────────────────────────────────────────
│ Attribute      │ Value                                                       ⋯
├────────────────┼──────────────────────────────────────────────────────────────
│ File Name(s)   │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../ex ⋯
│ # of events    │ 25                                                          ⋯
│ IO Format      │ TTree                                                       ⋯
│ EDM4hep schema │ 1.0.0                                                       ⋯
│ PODIO version  │ 0.17.1                                                      ⋯
│ ROOT version   │ 6.28.4                                                      ⋯
└────────────────┴──────────────────────────────────────────────────────────────
                                                                1 column omitted

Accessing the events

The TTree called events contains the events in the file. We can access the events by using the RootIO.get function with the reader object and the name of the TTree.

events = RootIO.get(reader, "events");
┌ Warning: Type MCRecoTrackerHitPlaneAssociation not found in EDM4hep module
└ @ EDM4hep.RootIO ~/work/EDM4hep.jl/EDM4hep.jl/src/RootIO.jl:332
Note that

If you get warnings is because there is a mismatch between the current schema version of EDM4hep and version when the file was written. The default behavior is ignore old types and set to zero the attributes that do not exists in the file.

We can access the first event in the file by using the index 1 in the events array.

evt = events[1];

the evt object is a UnROOT.LazyEvent. Each leaf (column) can be accessed directlyt if you know the name of the leaf. Typically the name is of the form <collection_name>_<field_name>. The full list of names can be obtained by calling the names function on the events object. For example:

for n in names(events) |> sort!
    startswith(n, "ECalBarrelCollection") && println(n)
end
ECalBarrelCollection_cellID
ECalBarrelCollection_contributions_begin
ECalBarrelCollection_contributions_end
ECalBarrelCollection_energy
ECalBarrelCollection_position_x
ECalBarrelCollection_position_y
ECalBarrelCollection_position_z
Note that

There is not need to access individual columns like this. The RootIO.get function can be used to access the collections directly.

Accessing the collections

The available collections in the event can be obtained by displaying the reader object.

show(reader)
┌────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┐
│ Attribute      │ Value                                                                                     │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ File Name(s)   │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../examples/ttbar_edm4hep_digi.root │
│ # of events    │ 25                                                                                        │
│ IO Format      │ TTree                                                                                     │
│ EDM4hep schema │ 1.0.0                                                                                     │
│ PODIO version  │ 0.17.1                                                                                    │
│ ROOT version   │ 6.28.4                                                                                    │
└────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────┬─────────────────────┬──────────────┐
│ BranchName                      │ Type                │ CollectionID │
├─────────────────────────────────┼─────────────────────┼──────────────┤
│ AllCaloHitContributionsCombined │ CaloHitContribution │ 0x3a25675d   │
│ BeamCalCollection               │ SimCalorimeterHit   │ 0xc298a348   │
│ ECalBarrelCollection            │ SimCalorimeterHit   │ 0x407eb17b   │
│ ECalEndcapCollection            │ SimCalorimeterHit   │ 0x0c09a156   │
│ ECalPlugCollection              │ SimCalorimeterHit   │ 0xd36e0cf8   │
│ EventHeader                     │ EventHeader         │ 0xd793ab91   │
│ HCalBarrelCollection            │ SimCalorimeterHit   │ 0x6daf3b41   │
│ HCalEndcapCollection            │ SimCalorimeterHit   │ 0x50d90aed   │
│ HCalRingCollection              │ SimCalorimeterHit   │ 0xad7abfca   │
│ InnerTrackerBarrelCollection    │ SimTrackerHit       │ 0xf57448f3   │
│ InnerTrackerEndcapCollection    │ SimTrackerHit       │ 0x6d724693   │
│ LumiCalCollection               │ SimCalorimeterHit   │ 0x45759015   │
│ MCParticle                      │ MCParticle          │ 0x0620d570   │
│ OuterTrackerBarrelCollection    │ SimTrackerHit       │ 0x083cc03d   │
│ OuterTrackerEndcapCollection    │ SimTrackerHit       │ 0x1450dff0   │
│ VXDTrackerHits                  │ TrackerHitPlane     │ 0xdb43b881   │
│ VertexBarrelCollection          │ SimTrackerHit       │ 0x85e68310   │
│ VertexEndcapCollection          │ SimTrackerHit       │ 0x5add7f42   │
│ YokeBarrelCollection            │ SimCalorimeterHit   │ 0x97bfada6   │
│ YokeEndcapCollection            │ SimCalorimeterHit   │ 0x60209550   │
└─────────────────────────────────┴─────────────────────┴──────────────┘

The RootIO.get function can be used to access the collections in the event. The function takes the reader object, the event object and the name of the collection as arguments.

calo = RootIO.get(reader, evt, "AllCaloHitContributionsCombined");
hits = RootIO.get(reader, evt, "ECalBarrelCollection");
mcps = RootIO.get(reader, evt, "MCParticle");
Note that

Relationships will automatically be setup if the related collections will be also be accessed. For example, in this case the relation of SimCalorimeterHits from the collection ECalBarrelCollection to CaloHitContributions from AllCaloHitContributionsCombined will be filled.

The calo object is a EDM4hep.EDCollection object that contains the calorimeter hit contributions. The hits object is a EDM4hep.EDCollection object that contains the hits in the ECal barrel. The mcps object is a EDM4hep.EDCollection object that contains the MCParticles.

We can now print some information about the collections.

for hit in take(hits, 20)
    println("ECAL Hit $(hit.index) has energy $(hit.energy) at position $(hit.position) with $(length(hit.contributions)) contributions")
end
ECAL Hit #1 has energy 0.0002169429 at position (-1487.2666, 428.77875, -1224.0) with 1 contributions
ECAL Hit #2 has energy 0.00031143325 at position (-1489.09, 435.72046, -1229.1) with 1 contributions
ECAL Hit #3 has energy 0.00032750145 at position (-1490.9135, 442.6622, -1234.2) with 1 contributions
ECAL Hit #4 has energy 0.0010127309 at position (-1490.9135, 442.6622, -1239.3) with 1 contributions
ECAL Hit #5 has energy 0.0010511694 at position (-1486.54, 440.1372, -1244.4) with 1 contributions
ECAL Hit #6 has energy 0.0012481669 at position (-1482.1666, 437.6122, -1249.5) with 1 contributions
ECAL Hit #7 has energy 0.0011303278 at position (-1495.2869, 445.1872, -1234.2) with 1 contributions
ECAL Hit #8 has energy 0.0015850706 at position (-1499.6603, 447.7122, -1239.3) with 1 contributions
ECAL Hit #9 has energy 0.0027348632 at position (-1501.4838, 454.65393, -1244.4) with 1 contributions
ECAL Hit #10 has energy 2.1748384e-7 at position (-1538.6509, 440.77856, -1208.7) with 1 contributions
ECAL Hit #11 has energy 0.00033324969 at position (-1607.1865, 574.5712, -1290.3) with 11 contributions
ECAL Hit #12 has energy 0.00017833424 at position (-1604.6366, 578.988, -1290.3) with 6 contributions
ECAL Hit #13 has energy 0.00012392712 at position (-1624.6803, 584.6712, -1315.8) with 1 contributions
ECAL Hit #14 has energy 2.5159894e-5 at position (-1516.0571, 439.51202, -1234.2) with 1 contributions
ECAL Hit #15 has energy 8.835346e-6 at position (-1676.4069, 484.97818, -1259.7) with 1 contributions
ECAL Hit #16 has energy 1.932569e-5 at position (-1481.44, 448.97067, -1264.8) with 1 contributions
ECAL Hit #17 has energy 3.2642229e-6 at position (1517.5103, -416.7951, 928.2) with 1 contributions
ECAL Hit #18 has energy 4.4099096e-8 at position (-1471.254, 537.31335, -1295.4) with 1 contributions
ECAL Hit #19 has energy 0.00010116091 at position (-1536.8274, 433.83682, -1300.5) with 1 contributions
ECAL Hit #20 has energy 0.00034957606 at position (1552.9, 321.3, -372.3) with 1 contributions

Lets check whether the total energy in the hits is the same as the sum of all the calorimeter contributions. This an example to show the expressivity of the Julia language. We can construct a StructArray with all the related hit contributions and use the sum function to sum the energy of all the contributions as a column of the constructed SoA.

for hit in hits
    StructArray(hit.contributions).energy |> sum |> c -> c ≈ hit.energy || println("Hit $(hit.index) has energy $(hit.energy) and contributions $(c)")
end

Drawing the EDMM4hep data

The following shows how easy is to draw EDM4hep event using the Plots module. In this example we want to plot the calorimeter hits in the space with dots with the size proportional to the energy of the hit.

maxE = hits.energy |> maximum
scatter(hits.position.x, hits.position.y, hits.position.z,
        markersize = (hits.energy/maxE)*10, color = :blue)
Example block output

The projection in the X-Y plane is shown below.

scatter(hits.position.x, hits.position.y, markersize = (hits.energy/maxE)*10, color = :blue)
Example block output

Accessing the hit attributes as columns of the hits object is very efficient. Lets verify it. Using the @btime macro from the BenchmarkTools module we can measure the time to access the energy column.

using BenchmarkTools
@benchmark hits.energy |> maximum
BenchmarkTools.Trial: 10000 samples with 42 evaluations per sample.
 Range (minmax):  901.929 ns308.282 μs   GC (min … max): 0.00% … 99.36%
 Time  (median):     922.905 ns                GC (median):    0.00%
 Time  (mean ± σ):     1.043 μs ±   3.081 μs   GC (mean ± σ):  2.94% ±  0.99%

  █▇      ▃▆▃▆▄▂▃▂▂▃▁                                         ▂
  ██▇▇▅▄▁▃███████████▇▆▃▅▅██▆██▆▆▆▅▇▆▅▃▃▇▆▄▃▅▅▆▆▅▆▆▇▇▇▆▇▇▇▆▇▇ █
  902 ns        Histogram: log(frequency) by time       1.66 μs <

 Memory estimate: 400 bytes, allocs estimate: 3.

The time to access the energy column is very small.

@benchmark begin
    _max = 0.0f0
    for h in hits
        _max = max(_max, h.energy)
    end
    _max
end
BenchmarkTools.Trial: 7652 samples with 1 evaluation per sample.
 Range (minmax):  498.813 μs 28.802 ms   GC (min … max):  0.00% … 97.47%
 Time  (median):     533.768 μs                GC (median):     0.00%
 Time  (mean ± σ):   651.230 μs ± 819.503 μs   GC (mean ± σ):  11.80% ± 10.26%

  █▁ ▄▂                                                       ▁
  ██▇██▇▅▄▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆ █
  499 μs        Histogram: log(frequency) by time       4.74 ms <

 Memory estimate: 955.98 KiB, allocs estimate: 21594.

The time to access the energy column using a loop is much larger.


This page was generated using Literate.jl.