EDM4hep.jl EDM I/O Tutorial
This tutorial shows how to read EDM4hep data files and perform some simple analysis
You can also download this tutorial as a Jupyter notebook and a plain Julia source file.
Table of contents
Loading the necessary modules
- We will use the
EDM4hep
module that defines all the EDM4hep types and its relations and links. - We will use the
FHist
module to create histograms - We will use the
Plots
module to plot the histograms
If these modules are not installed, you can install them by running the following commands:
using Pkg
Pkg.add("EDM4hep")
Pkg.add("FHist")
Pkg.add("Plots")
using EDM4hep
using EDM4hep.RootIO
using Base.Iterators: partition, take
using FHist
using Plots: plot, scatter, plot!, theme
Reading an EDM4hep file
We will read an EDM4hep file with the RootIO.Reader
function. This function returns a reader object that can be used to access the events in the file. The input file is a ROOT file with the EDM4hep data model and is located to the path ttbar_edm4hep_digi.root
.
finput = joinpath(@__DIR__,"../../examples" ,"ttbar_edm4hep_digi.root")
"/home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../examples/ttbar_edm4hep_digi.root"
We create a reader object to access the events in the file. The object displays a summary table of the content of the file.
reader = RootIO.Reader(finput)
┌────────────────┬──────────────────────────────────────────────────────────────
│ Attribute │ Value ⋯
├────────────────┼──────────────────────────────────────────────────────────────
│ File Name(s) │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../ex ⋯
│ # of events │ 25 ⋯
│ IO Format │ TTree ⋯
│ EDM4hep schema │ 1.0.0 ⋯
│ PODIO version │ 0.17.1 ⋯
│ ROOT version │ 6.28.4 ⋯
└────────────────┴──────────────────────────────────────────────────────────────
1 column omitted
Accessing the events
The TTree
called events
contains the events in the file. We can access the events by using the RootIO.get
function with the reader object and the name of the TTree
.
events = RootIO.get(reader, "events");
┌ Warning: Type MCRecoTrackerHitPlaneAssociation not found in EDM4hep module
└ @ EDM4hep.RootIO ~/work/EDM4hep.jl/EDM4hep.jl/src/RootIO.jl:332
If you get warnings is because there is a mismatch between the current schema version of EDM4hep and version when the file was written. The default behavior is ignore old types and set to zero the attributes that do not exists in the file.
We can access the first event in the file by using the index 1
in the events
array.
evt = events[1];
the evt
object is a UnROOT.LazyEvent
. Each leaf (column) can be accessed directlyt if you know the name of the leaf. Typically the name is of the form <collection_name>_<field_name>
. The full list of names can be obtained by calling the names
function on the events
object. For example:
for n in names(events) |> sort!
startswith(n, "ECalBarrelCollection") && println(n)
end
ECalBarrelCollection_cellID
ECalBarrelCollection_contributions_begin
ECalBarrelCollection_contributions_end
ECalBarrelCollection_energy
ECalBarrelCollection_position_x
ECalBarrelCollection_position_y
ECalBarrelCollection_position_z
There is not need to access individual columns like this. The RootIO.get
function can be used to access the collections directly.
Accessing the collections
The available collections in the event can be obtained by displaying the reader object.
show(reader)
┌────────────────┬───────────────────────────────────────────────────────────────────────────────────────────┐
│ Attribute │ Value │
├────────────────┼───────────────────────────────────────────────────────────────────────────────────────────┤
│ File Name(s) │ /home/runner/work/EDM4hep.jl/EDM4hep.jl/docs/build/../../examples/ttbar_edm4hep_digi.root │
│ # of events │ 25 │
│ IO Format │ TTree │
│ EDM4hep schema │ 1.0.0 │
│ PODIO version │ 0.17.1 │
│ ROOT version │ 6.28.4 │
└────────────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────┬─────────────────────┬──────────────┐
│ BranchName │ Type │ CollectionID │
├─────────────────────────────────┼─────────────────────┼──────────────┤
│ AllCaloHitContributionsCombined │ CaloHitContribution │ 0x3a25675d │
│ BeamCalCollection │ SimCalorimeterHit │ 0xc298a348 │
│ ECalBarrelCollection │ SimCalorimeterHit │ 0x407eb17b │
│ ECalEndcapCollection │ SimCalorimeterHit │ 0x0c09a156 │
│ ECalPlugCollection │ SimCalorimeterHit │ 0xd36e0cf8 │
│ EventHeader │ EventHeader │ 0xd793ab91 │
│ HCalBarrelCollection │ SimCalorimeterHit │ 0x6daf3b41 │
│ HCalEndcapCollection │ SimCalorimeterHit │ 0x50d90aed │
│ HCalRingCollection │ SimCalorimeterHit │ 0xad7abfca │
│ InnerTrackerBarrelCollection │ SimTrackerHit │ 0xf57448f3 │
│ InnerTrackerEndcapCollection │ SimTrackerHit │ 0x6d724693 │
│ LumiCalCollection │ SimCalorimeterHit │ 0x45759015 │
│ MCParticle │ MCParticle │ 0x0620d570 │
│ OuterTrackerBarrelCollection │ SimTrackerHit │ 0x083cc03d │
│ OuterTrackerEndcapCollection │ SimTrackerHit │ 0x1450dff0 │
│ VXDTrackerHits │ TrackerHitPlane │ 0xdb43b881 │
│ VertexBarrelCollection │ SimTrackerHit │ 0x85e68310 │
│ VertexEndcapCollection │ SimTrackerHit │ 0x5add7f42 │
│ YokeBarrelCollection │ SimCalorimeterHit │ 0x97bfada6 │
│ YokeEndcapCollection │ SimCalorimeterHit │ 0x60209550 │
└─────────────────────────────────┴─────────────────────┴──────────────┘
The RootIO.get
function can be used to access the collections in the event. The function takes the reader object, the event object and the name of the collection as arguments.
calo = RootIO.get(reader, evt, "AllCaloHitContributionsCombined");
hits = RootIO.get(reader, evt, "ECalBarrelCollection");
mcps = RootIO.get(reader, evt, "MCParticle");
Relationships will automatically be setup if the related collections will be also be accessed. For example, in this case the relation of SimCalorimeterHit
s from the collection ECalBarrelCollection
to CaloHitContribution
s from AllCaloHitContributionsCombined
will be filled.
The calo
object is a EDM4hep.EDCollection
object that contains the calorimeter hit contributions. The hits
object is a EDM4hep.EDCollection
object that contains the hits in the ECal barrel. The mcps
object is a EDM4hep.EDCollection
object that contains the MCParticles.
We can now print some information about the collections.
for hit in take(hits, 20)
println("ECAL Hit $(hit.index) has energy $(hit.energy) at position $(hit.position) with $(length(hit.contributions)) contributions")
end
ECAL Hit #1 has energy 0.0002169429 at position (-1487.2666, 428.77875, -1224.0) with 1 contributions
ECAL Hit #2 has energy 0.00031143325 at position (-1489.09, 435.72046, -1229.1) with 1 contributions
ECAL Hit #3 has energy 0.00032750145 at position (-1490.9135, 442.6622, -1234.2) with 1 contributions
ECAL Hit #4 has energy 0.0010127309 at position (-1490.9135, 442.6622, -1239.3) with 1 contributions
ECAL Hit #5 has energy 0.0010511694 at position (-1486.54, 440.1372, -1244.4) with 1 contributions
ECAL Hit #6 has energy 0.0012481669 at position (-1482.1666, 437.6122, -1249.5) with 1 contributions
ECAL Hit #7 has energy 0.0011303278 at position (-1495.2869, 445.1872, -1234.2) with 1 contributions
ECAL Hit #8 has energy 0.0015850706 at position (-1499.6603, 447.7122, -1239.3) with 1 contributions
ECAL Hit #9 has energy 0.0027348632 at position (-1501.4838, 454.65393, -1244.4) with 1 contributions
ECAL Hit #10 has energy 2.1748384e-7 at position (-1538.6509, 440.77856, -1208.7) with 1 contributions
ECAL Hit #11 has energy 0.00033324969 at position (-1607.1865, 574.5712, -1290.3) with 11 contributions
ECAL Hit #12 has energy 0.00017833424 at position (-1604.6366, 578.988, -1290.3) with 6 contributions
ECAL Hit #13 has energy 0.00012392712 at position (-1624.6803, 584.6712, -1315.8) with 1 contributions
ECAL Hit #14 has energy 2.5159894e-5 at position (-1516.0571, 439.51202, -1234.2) with 1 contributions
ECAL Hit #15 has energy 8.835346e-6 at position (-1676.4069, 484.97818, -1259.7) with 1 contributions
ECAL Hit #16 has energy 1.932569e-5 at position (-1481.44, 448.97067, -1264.8) with 1 contributions
ECAL Hit #17 has energy 3.2642229e-6 at position (1517.5103, -416.7951, 928.2) with 1 contributions
ECAL Hit #18 has energy 4.4099096e-8 at position (-1471.254, 537.31335, -1295.4) with 1 contributions
ECAL Hit #19 has energy 0.00010116091 at position (-1536.8274, 433.83682, -1300.5) with 1 contributions
ECAL Hit #20 has energy 0.00034957606 at position (1552.9, 321.3, -372.3) with 1 contributions
Lets check whether the total energy in the hits is the same as the sum of all the calorimeter contributions. This an example to show the expressivity of the Julia language. We can construct a StructArray
with all the related hit contributions and use the sum
function to sum the energy of all the contributions as a column of the constructed SoA.
for hit in hits
StructArray(hit.contributions).energy |> sum |> c -> c ≈ hit.energy || println("Hit $(hit.index) has energy $(hit.energy) and contributions $(c)")
end
Drawing the EDMM4hep data
The following shows how easy is to draw EDM4hep event using the Plots
module. In this example we want to plot the calorimeter hits in the space with dots with the size proportional to the energy of the hit.
maxE = hits.energy |> maximum
scatter(hits.position.x, hits.position.y, hits.position.z,
markersize = (hits.energy/maxE)*10, color = :blue)
The projection in the X-Y plane is shown below.
scatter(hits.position.x, hits.position.y, markersize = (hits.energy/maxE)*10, color = :blue)
Accessing the hit
attributes as columns of the hits
object is very efficient. Lets verify it. Using the @btime
macro from the BenchmarkTools
module we can measure the time to access the energy
column.
using BenchmarkTools
@benchmark hits.energy |> maximum
BenchmarkTools.Trial: 10000 samples with 42 evaluations per sample.
Range (min … max): 901.929 ns … 308.282 μs ┊ GC (min … max): 0.00% … 99.36%
Time (median): 922.905 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.043 μs ± 3.081 μs ┊ GC (mean ± σ): 2.94% ± 0.99%
█▇▃ ▃▆▄▃▆▄▂▃▂▂▃▁ ▂
███▇▇▅▄▁▃████████████▇▆▃▅▅██▆██▆▆▆▅▇▆▅▃▃▇▆▄▃▅▅▆▆▅▆▆▇▇▇▆▇▇▇▆▇▇ █
902 ns Histogram: log(frequency) by time 1.66 μs <
Memory estimate: 400 bytes, allocs estimate: 3.
The time to access the energy
column is very small.
@benchmark begin
_max = 0.0f0
for h in hits
_max = max(_max, h.energy)
end
_max
end
BenchmarkTools.Trial: 7652 samples with 1 evaluation per sample.
Range (min … max): 498.813 μs … 28.802 ms ┊ GC (min … max): 0.00% … 97.47%
Time (median): 533.768 μs ┊ GC (median): 0.00%
Time (mean ± σ): 651.230 μs ± 819.503 μs ┊ GC (mean ± σ): 11.80% ± 10.26%
█▃▂▁ ▄▂ ▁
████▇██▇▅▄▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▆ █
499 μs Histogram: log(frequency) by time 4.74 ms <
Memory estimate: 955.98 KiB, allocs estimate: 21594.
The time to access the energy
column using a loop is much larger.
This page was generated using Literate.jl.